Deep AutoEncoders for Collaborative Filtering

This is not an official NVIDIA product. It is a research project described in: "Training Deep AutoEncoders for Collaborative Filtering"(https://arxiv.org/abs/1708.01715)

The model

The model is based on deep AutoEncoders.

AutEncoderPic

Requirements

Python 3.6
Pytorch: pipenv install
CUDA (recommended version >= 8.0)

Training using mixed precision with Tensor Cores

You would need NVIDIA Volta-based GPU
Checkout mixed precision branch
For theory on mixed precision training see Mixed Precision Training paper

Getting Started

Run unittests first

The code is intended to run on GPU. Last test can take a minute or two.

$ python -m unittest test/data_layer_tests.py
$ python -m unittest test/test_model.py

Tutorial

Checkout this tutorial by miguelgfierro.

Get the data

Note: Run all these commands within your DeepRecommender folder

Netflix prize

Download from here into your DeepRecommender folder

$ tar -xvf nf_prize_dataset.tar.gz
$ tar -xf download/training_set.tar
$ python ./data_utils/netflix_data_convert.py training_set Netflix

Data stats, Dataset, Netflix 3 months, Netflix 6 months, Netflix 1 year, Netflix full, --------, ----------------, ----------------, -----------, ------------, Ratings train, 13,675,402, 29,179,009, 41,451,832, 98,074,901, Users train, 311,315, 390,795, 345,855, 477,412, Items train, 17,736, 17,757, 16,907, 17,768, Time range train, 2005-09-01 to 2005-11-31, 2005-06-01 to 2005-11-31, 2004-06-01 to 2005-05-31, 1999-12-01 to 2005-11-31, --------, ----------------, -----------, ------------, Ratings test, 2,082,559, 2,175,535, 3,888,684, 2,250,481, Users test, 160,906, 169,541, 197,951, 173,482, Items test, 17,261, 17,290, 16,506, 17,305, Time range test, 2005-12-01 to 2005-12-31, 2005-12-01 to 2005-12-31, 2005-06-01 to 2005-06-31, 2005-12-01 to 2005-12-31

Train the model

In this example, the model will be trained for 12 epochs. In paper we train for 102.

python run.py --gpu_ids 0 \
--path_to_train_data Netflix/NF_TRAIN \
--path_to_eval_data Netflix/NF_VALID \
--hidden_layers 512,512,1024 \
--non_linearity_type selu \
--batch_size 128 \
--logdir model_save \
--drop_prob 0.8 \
--optimizer momentum \
--lr 0.005 \
--weight_decay 0 \
--aug_step 1 \
--noise_prob 0 \
--num_epochs 12 \
--summary_frequency 1000

Note that you can run Tensorboard in parallel

$ tensorboard --logdir=model_save

Run inference on the Test set

python infer.py \
--path_to_train_data Netflix/NF_TRAIN \
--path_to_eval_data Netflix/NF_TEST \
--hidden_layers 512,512,1024 \
--non_linearity_type selu \
--save_path model_save/model.epoch_11 \
--drop_prob 0.8 \
--predictions_path preds.txt

Compute Test RMSE

python compute_RMSE.py --path_to_predictions=preds.txt

After 12 epochs you should get RMSE around 0.927. Train longer to get below 0.92

Results

It should be possible to achieve the following results. Iterative output re-feeding should be applied
once during each iteration.

(exact numbers will vary due to randomization), DataSet, RMSE, Model Architecture, --------, ----------------, ----------------, Netflix 3 months, 0.9373, n,128,256,256,dp(0.65),256,128,n, Netflix 6 months, 0.9207, n,256,256,512,dp(0.8),256,256,n, Netflix 1 year, 0.9225, n,256,256,512,dp(0.8),256,256,n, Netflix full, 0.9099, n,512,512,1024,dp(0.8),512,512,n

名稱與所有者	NVIDIA/DeepRecommender
主編程語言	Python
編程語言	Python (語言數: 1)
平台
許可證	MIT License

創建於	2017-09-08 20:48:44
推送於	2021-05-17 23:38:49
最后一次提交	2019-01-06 22:10:32
發布數	0

星數	1.7k
關注者數	72
派生數	341
提交數	41
已啟用問題?
問題數	20
打開的問題數	10
拉請求數	13
打開的拉請求數	1
關閉的拉請求數	2

已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?

DeepRecommender

Github星跟蹤圖