DeepRecommender

Deep learning for recommender systems

Github星跟踪图

Deep AutoEncoders for Collaborative Filtering

This is not an official NVIDIA product. It is a research project described in: "Training Deep AutoEncoders for Collaborative Filtering"(https://arxiv.org/abs/1708.01715)

The model

The model is based on deep AutoEncoders.

AutEncoderPic

Requirements

  • Python 3.6
  • Pytorch: pipenv install
  • CUDA (recommended version >= 8.0)

Training using mixed precision with Tensor Cores

Getting Started

Run unittests first

The code is intended to run on GPU. Last test can take a minute or two.

$ python -m unittest test/data_layer_tests.py
$ python -m unittest test/test_model.py

Tutorial

Checkout this tutorial by miguelgfierro.

Get the data

Note: Run all these commands within your DeepRecommender folder

Netflix prize

  • Download from here into your DeepRecommender folder
$ tar -xvf nf_prize_dataset.tar.gz
$ tar -xf download/training_set.tar
$ python ./data_utils/netflix_data_convert.py training_set Netflix

Data stats, Dataset, Netflix 3 months, Netflix 6 months, Netflix 1 year, Netflix full, --------, ----------------, ----------------, -----------, ------------, Ratings train, 13,675,402, 29,179,009, 41,451,832, 98,074,901, Users train, 311,315, 390,795, 345,855, 477,412, Items train, 17,736, 17,757, 16,907, 17,768, Time range train, 2005-09-01 to 2005-11-31, 2005-06-01 to 2005-11-31, 2004-06-01 to 2005-05-31, 1999-12-01 to 2005-11-31, --------, ----------------, -----------, ------------, Ratings test, 2,082,559, 2,175,535, 3,888,684, 2,250,481, Users test, 160,906, 169,541, 197,951, 173,482, Items test, 17,261, 17,290, 16,506, 17,305, Time range test, 2005-12-01 to 2005-12-31, 2005-12-01 to 2005-12-31, 2005-06-01 to 2005-06-31, 2005-12-01 to 2005-12-31

Train the model

In this example, the model will be trained for 12 epochs. In paper we train for 102.

python run.py --gpu_ids 0 \
--path_to_train_data Netflix/NF_TRAIN \
--path_to_eval_data Netflix/NF_VALID \
--hidden_layers 512,512,1024 \
--non_linearity_type selu \
--batch_size 128 \
--logdir model_save \
--drop_prob 0.8 \
--optimizer momentum \
--lr 0.005 \
--weight_decay 0 \
--aug_step 1 \
--noise_prob 0 \
--num_epochs 12 \
--summary_frequency 1000

Note that you can run Tensorboard in parallel

$ tensorboard --logdir=model_save

Run inference on the Test set

python infer.py \
--path_to_train_data Netflix/NF_TRAIN \
--path_to_eval_data Netflix/NF_TEST \
--hidden_layers 512,512,1024 \
--non_linearity_type selu \
--save_path model_save/model.epoch_11 \
--drop_prob 0.8 \
--predictions_path preds.txt

Compute Test RMSE

python compute_RMSE.py --path_to_predictions=preds.txt

After 12 epochs you should get RMSE around 0.927. Train longer to get below 0.92

Results

It should be possible to achieve the following results. Iterative output re-feeding should be applied
once during each iteration.

(exact numbers will vary due to randomization), DataSet, RMSE, Model Architecture, --------, ----------------, ----------------, Netflix 3 months, 0.9373, n,128,256,256,dp(0.65),256,128,n, Netflix 6 months, 0.9207, n,256,256,512,dp(0.8),256,256,n, Netflix 1 year, 0.9225, n,256,256,512,dp(0.8),256,256,n, Netflix full, 0.9099, n,512,512,1024,dp(0.8),512,512,n

主要指标

概览
名称与所有者NVIDIA/DeepRecommender
主编程语言Python
编程语言Python (语言数: 1)
平台
许可证MIT License
所有者活动
创建于2017-09-08 20:48:44
推送于2021-05-17 23:38:49
最后一次提交2019-01-06 22:10:32
发布数0
用户参与
星数1.7k
关注者数72
派生数341
提交数41
已启用问题?
问题数20
打开的问题数10
拉请求数13
打开的拉请求数1
关闭的拉请求数2
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?