a Map/Reduce framework for distributed computing

  • 所有者: Disco Project point_right 该所有者的项目(1)
  • 平台: TBD
  • 许可证: BSD 3-Clause "New" or "Revised" License
  • 分类:
    TBD
  • 主题:
  • 喜欢:
    0
      对比:

Github 星跟踪图

Disco - Massive data, Minimal code

Disco Logo

Disco is a distributed map-reduce and big-data framework. Like the original framework, which was publicized by Google, Disco supports parallel computations over large data sets on an unreliable cluster of computers. This makes it a perfect tool for analyzing and processing large datasets without having to bother about difficult technical questions related to distributed computing, such as communication protocols, load balancing, locking, job scheduling or fault tolerance, all of which are taken care by Disco.

Writing a Disco job is very simple. For example, the following job counts the number of words in a document:

from disco.core import Job, result_iterator

def map(line, params):
    for word in line.split():
        yield word, 1

def reduce(iter, params):
    from disco.util import kvgroup
    for word, counts in kvgroup(sorted(iter)):
        yield word, sum(counts)

if __name__ == '__main__':
    input = ["http://discoproject.org/media/text/chekhov.txt"]
    job = Job().run(input=input, map=map, reduce=reduce)
    for word, count in result_iterator(job.wait()):
        print word, count

Note: For installing Disco, you cannot use the zip or tar.gz packages generated by github, instead you should clone this repository.

The develop branch contains the newest features and is not recommended for use in production. The master branch is the latest stable release and is tested in production. Important bug fixes will be first merged into the develop branch and then backported into the master branch.

Disco integrates with a lot of different tools. The following screenshot, for example, shows using ipython notebook to write a Disco job and using matplotlib to plot the results: ipython example

To learn more about the Disco Ecosystem see Disco Integrations. For some other resources, check out the Talks on Disco. Visit [discoproject.org] (http://discoproject.org) for more information.

Build Status: Travis-CI :: Travis-CI

项目概况

主要编程语言Erlang
编程语言Makefile
许可证BSD 3-Clause "New" or "Revised" License
最新版本名称0.5.4
第一版名称0.1
最后发布时间2014-10-27 22:01:49
首次发布2008-09-02 01:48:47
最后一次提交2014-10-27 22:01:49
创建于2008-07-24T08:46:39
推送于2018-01-30T20:55:22
提交数3.7k
关注者数86
名称与所有者discoproject/disco
派生数258
星数1.6k
问题数418
打开的问题数129
发布数24
语言数8
拉请求数187
打开的拉请求数11
关闭的拉请求数42
已启用问题?
已启用Wiki?
已存档?
是分叉?
已锁定?
是镜像?
是私有?
To the top