disco

a Map/Reduce framework for distributed computing

  • 所有者: discoproject/disco
  • 平台:
  • 許可證: BSD 3-Clause "New" or "Revised" License
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

Disco - Massive data, Minimal code

Disco Logo

Disco is a distributed map-reduce and big-data framework. Like
the original framework, which was publicized by Google, Disco supports
parallel computations over large data sets on an unreliable cluster of
computers. This makes it a perfect tool for analyzing and processing large
datasets without having to bother about difficult technical questions
related to distributed computing, such as communication protocols, load
balancing, locking, job scheduling or fault tolerance, all of which are taken
care by Disco.

Writing a Disco job is very simple. For example, the following job counts the number of words in a document:

from disco.core import Job, result_iterator

def map(line, params):
    for word in line.split():
        yield word, 1

def reduce(iter, params):
    from disco.util import kvgroup
    for word, counts in kvgroup(sorted(iter)):
        yield word, sum(counts)

if __name__ == '__main__':
    input = ["http://discoproject.org/media/text/chekhov.txt"]
    job = Job().run(input=input, map=map, reduce=reduce)
    for word, count in result_iterator(job.wait()):
        print word, count

Note: For installing Disco, you cannot use the zip or tar.gz packages generated by github, instead you should clone this repository.

The develop branch contains the newest features and is not recommended for use
in production. The master branch is the latest stable release and is tested in
production. Important bug fixes will be first merged into the develop branch
and then backported into the master branch.

Disco integrates with a lot of different tools. The following screenshot,
for example, shows using ipython notebook to write a Disco job and using
matplotlib to plot the results:
ipython example

To learn more about the Disco Ecosystem see Disco Integrations. For some other resources, check out the Talks on Disco. Visit [discoproject.org] (http://discoproject.org) for more information.

Build Status: Travis-CI :: Travis-CI

概覽

名稱與所有者discoproject/disco
主編程語言Erlang
編程語言Makefile (語言數: 8)
平台
許可證BSD 3-Clause "New" or "Revised" License
發布數24
最新版本名稱0.5.4 (發布於 )
第一版名稱0.1 (發布於 2008-09-02 01:48:47)
創建於2008-07-24 08:46:39
推送於2018-01-30 20:55:22
最后一次提交2014-10-27 22:01:49
星數1.6k
關注者數85
派生數244
提交數3.7k
已啟用問題?
問題數418
打開的問題數129
拉請求數187
打開的拉請求數11
關閉的拉請求數42
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?
去到頂部