data_hacking

Click Security Data Hacking Project

  • 所有者: SuperCowPowers/data_hacking
  • 平台:
  • 许可证: MIT License
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

data_hacking

Welcome to the Data Hacking Project

"Hacking in the sense of deconstructing an idea, hardware, anything and getting it to do something it wasn’t intended or to better understand how something works." (BSides CFP)

So hacking here means we want to quickly deconstruct data, understand what we've got and how to best utilize it for the problem at hand.

The primary motivation for these exercises is to explore the nexus of IPython, Pandas and Scikit Learn on security data of various kinds. The exercises will often intentionally show common missteps, warts in the data, paths that didn't work out that well and results that could definitely be improved upon. In general we're trying to capture what worked and what didn't, not only is that more realistic but often much more informative to the reader. :)

Python Modules Used:

  • IPython: Architecture for interactive computing and presentation
  • Pandas: Python Data Analysis Library
  • Scikit Learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
  • Matplotlib: Python 2D plotting library

Exercises:

Setup:

  • Required packages:

    • Brew/apt-get
      • graphviz, freetype, zmq
    • Python
      • ipython, pygraphviz, pandas, matplotlib, networkx, pyzmq, jinja2, scipy, patsy, statsmodels, pefile, macholib
  • Some of the exercises use packages from the data_hacking repository, to install those packages into your python site packages:

  • To uninstall:

Install IPython:

There's quite a bit of google results for this, we actually have mixed feelings about the IPython install instructions on the IPython page. The directions work but it directs you to download and install Anaconda or the free edition of Enthought Canopy. Both of these are prepackaged python distributions with a bunch of stuff like Numpy, Scipy, IPython, Matplotlib, Pandas, ... occasionally these will have a hitch and then you might be a bit SOL because StackOverflow is going to say 'WTF are those things? Just do '$pip install blah' or '$brew install blah'.

So we recommend you be brave and do it the normal way... in particular this guy seems to have a pretty good write up for Mac installs:

Running the Notebooks:

Most of the notebooks will have relative paths to some resources, data files or images. In general the easiest way we found to run ipython on the notebooks is to change into that project directory and run ipython with this alias (put in your .bashrc or whatever):

主要指标

概览
名称与所有者SuperCowPowers/data_hacking
主编程语言Jupyter Notebook
编程语言Python (语言数: 5)
平台
许可证MIT License
所有者活动
创建于2013-10-24 15:43:11
推送于2019-03-05 21:56:42
最后一次提交2019-03-05 14:56:37
发布数1
最新版本名称pre-stats-work-merge (发布于 2014-01-13 19:14:49)
第一版名称pre-stats-work-merge (发布于 2014-01-13 19:14:49)
用户参与
星数776
关注者数87
派生数300
提交数155
已启用问题?
问题数10
打开的问题数7
拉请求数1
打开的拉请求数0
关闭的拉请求数2
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?