Recommenders

Best Practices on Recommendation Systems

Github星跟蹤圖

Recommenders

Documentation Status

This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

  • Prepare Data: Preparing and loading data for each recommender algorithm
  • Model: Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM).
  • Evaluate: Evaluating algorithms with offline metrics
  • Model Select and Optimize: Tuning and optimizing hyperparameters for recommender models
  • Operationalize: Operationalizing models in a production environment on Azure

Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the reco_utils documentation.

For a more detailed overview of the repository, please see the documents at the wiki page.

Getting Started

Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks.

To setup on your local machine:

  1. Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.
  2. Clone the repository
    git clone https://github.com/Microsoft/Recommenders
    
  3. Run the generate conda file script to create a conda environment:
    (This is for a basic python environment, see SETUP.md for PySpark and GPU environment setup)
    cd Recommenders
    python scripts/generate_conda_file.py
    conda env create -f reco_base.yaml  
    
  4. Activate the conda environment and register it with Jupyter:
    conda activate reco_base
    python -m ipykernel install --user --name reco_base --display-name "Python (reco)"
    
  5. Start the Jupyter notebook server
    jupyter notebook
    
  6. Run the SAR Python CPU MovieLens notebook under the 00_quick_start folder. Make sure to change the kernel to "Python (reco)".

NOTE - The Alternating Least Squares (ALS) notebooks require a PySpark environment to run. Please follow the steps in the setup guide to run these notebooks in a PySpark environment. For the deep learning algorithms, it is recommended to use a GPU machine.

Algorithms

The table below lists the recommender algorithms currently available in the repository. Notebooks are linked under the Environment column when different implementations are available., Algorithm, Environment, Type, Description, ---, ---, ---, ---, Alternating Least Squares (ALS), PySpark, Collaborative Filtering, Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability, Cornac/Bayesian Personalized Ranking (BPR), Python CPU, Collaborative Filtering, Matrix factorization algorithm for predicting item ranking with implicit feedback, Deep Knowledge-Aware Network (DKN), Python CPU / Python GPU, Content-Based Filtering, Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations, Extreme Deep Factorization Machine (xDeepFM), Python CPU / Python GPU, Hybrid, Deep learning based algorithm for implicit and explicit feedback with user/item features, Factorization Machine (FM) / Field-Aware FM (FFM), Python CPU, Content-Based Filtering, Algorithm that predict labels with user/item features, FastAI Embedding Dot Bias (FAST), Python CPU / Python GPU, Collaborative Filtering, General purpose algorithm with embeddings and biases for users and items, LightGBM/Gradient Boosting Tree, Python CPU / PySpark, Content-Based Filtering, Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems, Neural Collaborative Filtering (NCF), Python CPU / Python GPU, Collaborative Filtering, Deep learning algorithm with enhanced performance for implicit feedback, Restricted Boltzmann Machines (RBM), Python CPU / Python GPU, Collaborative Filtering, Neural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback, Riemannian Low-rank Matrix Completion (RLRMC), Python CPU, Collaborative Filtering, Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption., Simple Algorithm for Recommendation (SAR), Python CPU, Collaborative Filtering, Similarity-based algorithm for implicit feedback dataset, Surprise/Singular Value Decomposition (SVD), Python CPU, Collaborative Filtering, Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large, Vowpal Wabbit Family (VW), Python CPU (online training), Content-Based Filtering, Fast online learning algorithms, great for scenarios where user features / context are constantly changing, Wide and Deep, Python CPU / Python GPU, Hybrid, Deep learning algorithm that can memorize feature interactions and generalize user features, NOTE: * indicates algorithms invented/contributed by Microsoft.

Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests., Algorithm, Environment, Type, Description, ---, ---, ---, ---, SARplus *, PySpark, Collaborative Filtering, Optimized implementation of SAR for Spark, ### Preliminary Comparison

We provide a benchmark notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, the MovieLens dataset is split into training/test sets at a 75/25 ratio using a stratified split. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k=10 (top 10 recommended items). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 P100 GPU). Spark ALS is run in local standalone mode. In this table we show the results on Movielens 100k, running the algorithms for 15 epochs., Algo, MAP, nDCG@k, Precision@k, Recall@k, RMSE, MAE, R2, Explained Variance, ---, ---, ---, ---, ---, ---, ---, ---, ---, ALS, 0.004732, 0.044239, 0.048462, 0.017796, 0.965038, 0.753001, 0.255647, 0.251648, SVD, 0.012873, 0.095930, 0.091198, 0.032783, 0.938681, 0.742690, 0.291967, 0.291971, SAR, 0.113028, 0.388321, 0.333828, 0.183179, N/A, N/A, N/A, N/A, NCF, 0.107720, 0.396118, 0.347296, 0.180775, N/A, N/A, N/A, N/A, BPR, 0.105365, 0.389948, 0.349841, 0.181807, N/A, N/A, N/A, N/A, FastAI, 0.025503, 0.147866, 0.130329, 0.053824, 0.943084, 0.744337, 0.285308, 0.287671, ## Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

Build Status

These tests are the nightly builds, which compute the smoke and integration tests. master is our main branch and staging is our development branch. We use pytest for testing python utilities in reco_utils and papermill for the notebooks. For more information about the testing pipelines, please see the test documentation.

DSVM Build Status

The following tests run on a Windows and Linux DSVM daily. These machines run 24/7., Build Type, Branch, Status, Branch, Status, ---, ---, ---, ---, ---, ---, Linux CPU, master, Build Status, staging, Build Status, Linux GPU, master, Build Status, staging, Build Status, Linux Spark, master, Build Status, staging, Build Status, Windows CPU, master, Build Status, staging, Build Status, Windows GPU, master, Build Status, staging, Build Status, Windows Spark, master, Build Status, staging, Build Status, <!-- Hiding AzureML build status for the moment

AzureML Build Status

The following tests run on an AzureML compute target. AzureML allows to programmatically start a virtual machine, execute the tests, gather the results in Azure DevOps and shut down the machine., Build Type, Branch, Status, Branch, Status, ---, ---, ---, ---, ---, ---, nightly_cpu_tests, master, Build Status, Staging, Build Status, nightly_gpu_tests, master, Build Status, Staging, Build Status, -->

Microsoft AI Github: Find other Best Practice projects, and Azure AI design patterns in our central repository.

主要指標

概覽
名稱與所有者recommenders-team/recommenders
主編程語言Python
編程語言Jupyter Notebook (語言數: 5)
平台
許可證MIT License
所有者活动
創建於2018-09-19 10:06:07
推送於2025-04-22 01:11:51
最后一次提交
發布數14
最新版本名稱1.2.1 (發布於 2024-12-23 14:44:41)
第一版名稱0.1.0 (發布於 2018-11-12 13:46:02)
用户参与
星數20.1k
關注者數278
派生數3.2k
提交數9k
已啟用問題?
問題數881
打開的問題數160
拉請求數1107
打開的拉請求數11
關閉的拉請求數202
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?