FlockDB

分布式、容错的图数据库。(A distributed, fault-tolerant graph database)

Github stars Tracking Chart

状态

Twitter 不再维护这个项目,也不再回应问题或 PR。

FlockDB

FlockDB 是一个用于存储邻接列表的分布式图数据库,目标是支持:

  • 高速率的添加/更新/删除操作。
  • 复集算术查询
  • 翻阅包含数百万条记录的查询结果集。
  • 能够 "归档" 并在以后恢复归档的边缘。
  • 横向扩展,包括复制
  • 在线数据迁移

非目标包括:

  • 多跳查询(或走图查询)
  • 自动碎片迁移

FlockDB 比其他图数据库(如 neo4j)要简单得多,因为它试图解决更少的问题。它是水平扩展的,是为在线、低延迟、高吞吐量的环境设计的,比如网站。

Twitter 使用 FlockDB 来存储社交图谱(谁关注谁,谁屏蔽谁)和二级索引。截止到2010年4月,Twitter FlockDB 集群存储了130多亿条边(edges),维持了 20k 写入/秒和 100k 读取/秒的峰值流量。

它能做什么?

比如说,如果你要存储一个社交图(用户 A 关注用户 B),而且不一定是对称的(A 可以不关注 B,但 B 可以不关注 A),那么 FlockDB 可以将这种关系存储为一条边:节点 A 指向节点 B,它存储的这条边是有排序位置的,而且是双向的,这样它就可以回答 "谁关注 A?" 以及 "A 关注谁?"

这就是所谓的定向图。(从技术上讲,FlockDB 存储的是定向图的邻接列表。)每个边有一个 64 位的源 ID,一个 64 位的目的 ID,一个状态(正常、删除、归档),以及一个 32 位的用于排序的位置。边缘以正向和反向的方式存储,这意味着可以根据源 ID 或目的 ID 来查询边缘。

例如,如果节点 134 指向节点 90,其排序位置为 5,那么就有两行写进了后向存储:

forward: 134 -> 90 at position 5
backward: 90 <- 134 at position 5

如果你存储的是一个社交图谱,这个图谱可能叫做 "关注",你可能会使用当前时间作为位置,这样关注者的列表就会按照时间的先后顺序。在这种情况下,如果用户 134 是 Nick,用户 90 是 Robey,那么 FlockDB 可以存储:

forward: Nick follows Robey at 9:54 today
backward: Robey is followed by Nick at 9:54 today

(source,destination)必须是唯一的:只有一条边可以从节点 A 指向节点 B,但位置和状态可以随时修改。位置仅用于对查询结果进行排序,状态用于标记已被删除或归档(放入冷睡眠)的边缘。

构建

从理论上讲,构建简单到

$ sbt clean update package-dist

但有一些先决条件。你需要

  • java 1. 6
  • sbt 0.7.4
  • thrift 0.5.0

如果你以前没有使用过 sbt,这个页面有一个快速设置:http://code.google.com/p/simple-build-tool/wiki/Setup。我的 ~/bin/sbt 是这样的:

#!/bin/bash
java -server -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256m -Xmx1024m -jar `dirname $0`/sbt-launch-0.7.4.jar "$@"

Apache Thrift 0.5.0 是构建 thrift IDL 的 java stubs 的前提条件。它不能通过 jar 安装,所以你需要在构建之前单独安装它。它可以在 apache thrift 网站上找到:http://thrift.apache.org/。你可以在这里找到 0.5.0 的下载:http://archive.apache.org/dist/incubator/thrift/0.5.0-incubating/。

此外,测试要求本地 mysql 实例运行,并且 DB_USERNAME 和 DB_PASSWORD env vars 需要包含它的登录信息。如果你想的话,你可以跳过这些测试(但你应该感到一阵内疚)。

$ NO_TESTS=1 sbt package-dist

运行

查看演示,了解如何启动 FlockDB 的本地开发实例。它还展示了如何添加边缘,查询它们等。

社区

贡献者

  • Nick Kallen @nk
  • Robey Pointer @robey
  • John Kalucki @jkalucki
  • Ed Ceaser @asdf


Overview

Name With Ownergurugio/lowlevelprogramming-university
Primary Language
Program languageRuby (Language Count: 0)
PlatformLinux, Mac, Windows
License:GNU General Public License v3.0
Release Count0
Created At2017-01-01 17:13:04
Pushed At2024-04-16 12:17:39
Last Commit At2024-04-16 15:17:39
Stargazers Count9.6k
Watchers Count289
Fork Count692
Commits Count234
Has Issues Enabled
Issues Count21
Issue Open Count0
Pull Requests Count28
Pull Requests Open Count0
Pull Requests Close Count6
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

STATUS

Twitter is no longer maintaining this project or responding to issues or PRs.

FlockDB

FlockDB is a distributed graph database for storing adjancency lists, with
goals of supporting:

  • a high rate of add/update/remove operations
  • potientially complex set arithmetic queries
  • paging through query result sets containing millions of entries
  • ability to "archive" and later restore archived edges
  • horizontal scaling including replication
  • online data migration

Non-goals include:

  • multi-hop queries (or graph-walking queries)
  • automatic shard migrations

FlockDB is much simpler than other graph databases such as neo4j because it
tries to solve fewer problems. It scales horizontally and is designed for
on-line, low-latency, high throughput environments such as web-sites.

Twitter uses FlockDB to store social graphs (who follows whom, who blocks
whom) and secondary indices. As of April 2010, the Twitter FlockDB cluster
stores 13+ billion edges and sustains peak traffic of 20k writes/second and
100k reads/second.

It does what?

If, for example, you're storing a social graph (user A follows user B), and
it's not necessarily symmetrical (A can follow B without B following A), then
FlockDB can store that relationship as an edge: node A points to node B. It
stores this edge with a sort position, and in both directions, so that it can
answer the question "Who follows A?" as well as "Whom is A following?"

This is called a directed graph. (Technically, FlockDB stores the adjacency
lists of a directed graph.) Each edge has a 64-bit source ID, a 64-bit
destination ID, a state (normal, removed, archived), and a 32-bit position
used for sorting. The edges are stored in both a forward and backward
direction, meaning that an edge can be queried based on either the source or
destination ID.

For example, if node 134 points to node 90, and its sort position is 5, then
there are two rows written into the backing store:

forward: 134 -> 90 at position 5
backward: 90 <- 134 at position 5

If you're storing a social graph, the graph might be called "following", and
you might use the current time as the position, so that a listing of followers
is in recency order. In that case, if user 134 is Nick, and user 90 is Robey,
then FlockDB can store:

forward: Nick follows Robey at 9:54 today
backward: Robey is followed by Nick at 9:54 today

The (source, destination) must be unique: only one edge can point from node A
to node B, but the position and state may be modified at any time. Position is
used only for sorting the results of queries, and state is used to mark edges
that have been removed or archived (placed into cold sleep).

Building

In theory, building is as simple as

$ sbt clean update package-dist

but there are some pre-requisites. You need:

  • java 1.6
  • sbt 0.7.4
  • thrift 0.5.0

If you haven't used sbt before, this page has a quick setup:
http://code.google.com/p/simple-build-tool/wiki/Setup.
My ~/bin/sbt looks like this:

#!/bin/bash
java -server -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256m -Xmx1024m -jar `dirname $0`/sbt-launch-0.7.4.jar "$@"

Apache Thrift 0.5.0 is pre-requisite for building java stubs of the thrift
IDL. It can't be installed via jar, so you'll need to install it separately
before you build. It can be found on the apache thrift site:
http://thrift.apache.org/.
You can find the download for 0.5.0 here:
http://archive.apache.org/dist/incubator/thrift/0.5.0-incubating/.

In addition, the tests require a local mysql instance to be running, and for
DB_USERNAME and DB_PASSWORD env vars to contain login info for it. You can
skip the tests if you want (but you should feel a pang of guilt):

$ NO_TESTS=1 sbt package-dist

Running

Check out
the demo
for instructions on how to start up a local development instance of FlockDB.
It also shows how to add edges, query them, etc.

Community

Contributors

  • Nick Kallen @nk
  • Robey Pointer @robey
  • John Kalucki @jkalucki
  • Ed Ceaser @asdf
To the top