rep

Machine Learning toolbox for Humans

  • 所有者: yandex/rep
  • 平台:
  • 许可证: Other
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

Reproducible Experiment Platform (REP)

Join the chat at https://gitter.im/yandex/rep
Build Status
PyPI version
Documentation
CircleCI

REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way.

Main features:

  • unified python wrapper for different ML libraries (wrappers follow extended scikit-learn interface)
    • Sklearn
    • TMVA
    • XGBoost
    • uBoost
    • Theanets
    • Pybrain
    • Neurolab
    • MatrixNet service(available to CERN)
  • parallel training of classifiers on cluster
  • classification/regression reports with plots
  • interactive plots supported
  • smart grid-search algorithms with parallel execution
  • research versioning using git
  • pluggable quality metrics for classification
  • meta-algorithm design (aka 'rep-lego')

REP is not trying to substitute scikit-learn, but extends it and provides better user experience.

Howto examples

To get started, look at the notebooks in /howto/

Notebooks can be viewed (not executed) online at nbviewer
There are basic introductory notebooks (about python, IPython) and more advanced ones (about the REP itself)

Examples code is written in python 2, but library is python 2 and python 3 compatible.

Installation with Docker

We provide the docker image with REP and all it's dependencies.
It is a recommended way, specially if you're not experienced in python.

Installation with bare hands

However, if you want to install REP and all of its dependencies on your machine yourself, follow this manual:
installing manually and
running manually.

License

Apache 2.0, library is open-source.

Minimal examples

REP wrappers are sklearn compatible:

from rep.estimators import XGBoostClassifier, SklearnClassifier, TheanetsClassifier
clf = XGBoostClassifier(n_estimators=300, eta=0.1).fit(trainX, trainY)
probabilities = clf.predict_proba(testX)

Beloved trick of kagglers is to run bagging over complex algorithms. This is how it is done in REP:

from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(base_estimator=XGBoostClassifier(), n_estimators=10)
# wrapping sklearn to REP wrapper
clf = SklearnClassifier(clf)

Another useful trick is to use folding instead of splitting data into train/test.
This is specially useful when you're using some kind of complex stacking

from rep.metaml import FoldingClassifier
clf = FoldingClassifier(TheanetsClassifier(), n_folds=3)
probabilities = clf.fit(X, y).predict_proba(X)

In example above all data are splitted into 3 folds,
and each fold is predicted by classifier which was trained on other 2 folds.

Also REP classifiers provide report:

report = clf.test_on(testX, testY)
report.roc().plot() # plot ROC curve
from rep.report.metrics import RocAuc
# learning curves are useful when training GBDT!
report.learning_curve(RocAuc(), steps=10)  

You can read about other REP tools (like smart distributed grid search, folding and factory)
in documentation and howto examples.

主要指标

概览
名称与所有者yandex/rep
主编程语言Jupyter Notebook
编程语言Python (语言数: 4)
平台
许可证Other
所有者活动
创建于2015-04-13 18:32:24
推送于2024-07-31 13:36:37
最后一次提交2016-11-10 18:13:27
发布数10
最新版本名称0.6.6 (发布于 )
第一版名称0.6.0 (发布于 )
用户参与
星数696
关注者数49
派生数150
提交数1k
已启用问题?
问题数71
打开的问题数23
拉请求数28
打开的拉请求数6
关闭的拉请求数13
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?