sparser

Sparser: Raw Filtering for Faster Analytics over Raw Data

  • Owner: stanford-futuredata/sparser
  • Platform:
  • License:: BSD 3-Clause "New" or "Revised" License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

sparser

This code base implements Sparser, raw filtering for faster analytics over raw data. Sparser can parse JSON, Avro, and Parquet data up to 22x faster than the state of the art. For more details, check out our paper published at VLDB 2018.

See the demo-repl directory for a brief example. To run it:

# update rapidjson submodule
git submodule init
git submodule update
cd demo-repl
make
./bench /path/to/large/file.json

Then enter 1 at the Sparser> prompt.

Sparser itself is just a header file and only depends on standard C libraries available
on most systems.

Main metrics

Overview
Name With Ownerstanford-futuredata/sparser
Primary LanguageC
Program languageC++ (Language Count: 3)
Platform
License:BSD 3-Clause "New" or "Revised" License
所有者活动
Created At2018-03-28 22:55:10
Pushed At2018-09-18 12:48:08
Last Commit At
Release Count0
用户参与
Stargazers Count433
Watchers Count39
Fork Count54
Commits Count285
Has Issues Enabled
Issues Count5
Issue Open Count5
Pull Requests Count2
Pull Requests Open Count0
Pull Requests Close Count1
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private