parallel_ml_tutorial

Tutorial on scikit-learn and IPython for parallel machine learning

  • 所有者: ogrisel/parallel_ml_tutorial
  • 平台:
  • 許可證:
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

Parallel Machine Learning with scikit-learn and IPython

Video Tutorial

Video recording of this tutorial given at PyCon in 2013. The tutorial material
has been rearranged in part and extended. Look at the title of the of the
notebooks to be able to follow along the presentation.

Browse the static notebooks on nbviewer.ipython.org.

Scope of this tutorial:

  • Learn common machine learning concepts and how they match the scikit-learn
    Estimator API.

  • Learn about scalable feature extraction for text classification and
    clustering

  • Learn how to perform parallel cross validation and hyper parameters grid
    search in parallel with IPython.

  • Learn to analyze the kinds of common errors predictive models are subject to
    and how to refine your modeling to take this analysis into account.

  • Learn to optimize memory allocation on your computing nodes with numpy memory
    mapping features.

  • Learn how to run a cheap IPython cluster for interactive predictive modeling on
    the Amazon EC2 spot instances using StarCluster.

Target audience

This tutorial targets developers with some experience with scikit-learn and
machine learning concepts in general.

It is recommended to first go through one of the tutorials hosted at
scikit-learn.org if you are new to scikit-learn.

You might might also want to have a look at SciPy Lecture
Notes
first if you are new to the NumPy /
SciPy / matplotlib ecosystem.

Setup

Install NumPy, SciPy, matplotlib, IPython, psutil, and scikit-learn in their latest
stable version (e.g. IPython 2.2.0 and scikit-learn 0.15.2 at the time of
writing).

You can find up to date installation instructions on
scikit-learn.org and
ipython.org .

To check your installation, launch the ipython interactive shell in a console
and type the following import statements to check each library:

>>> import numpy
>>> import scipy
>>> import matplotlib
>>> import psutil
>>> import sklearn

If you don't get any message, everything is fine. If you get an error message,
please ask for help on the mailing list of the matching project and don't
forget to mention the version of the library you are trying to install along
with the type of platform and version (e.g. Windows 8.1, Ubuntu 14.04, OSX
10.9...).

You can exit the ipython shell by typing exit.

Fetching the data

It is recommended to fetch the datasets ahead of time before diving into the
tutorial material itself. To do so run the fetch_data.py script in this
folder:

python fetch_data.py

Using the IPython notebook to follow the tutorial

The tutorial material and exercises are hosted in a set of IPython executable
notebook files.

To run them interactively do:

$ cd notebooks
$ ipython notebook

This should automatically open a new browser window listing all the notebooks
of the folder.

You can then execute the cell in order by hitting the "Shift-Enter" keys and
watch the output display directly under the cell and the cursor move on to the
next cell. Go to the "Help" menu for links to the notebook tutorial.

Credits

Some of this material is adapted from the scipy 2013 tutorial:

http://github.com/jakevdp/sklearn_scipy2013

Original authors:

主要指標

概覽
名稱與所有者ogrisel/parallel_ml_tutorial
主編程語言Jupyter Notebook
編程語言Makefile (語言數: 3)
平台
許可證
所有者活动
創建於2013-01-10 22:31:26
推送於2016-10-04 04:50:13
最后一次提交2016-02-08 11:03:02
發布數0
用户参与
星數1.6k
關注者數182
派生數600
提交數215
已啟用問題?
問題數4
打開的問題數2
拉請求數4
打開的拉請求數2
關閉的拉請求數0
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?