quilt

Quilt versions and deploys data

Github星跟蹤圖

docs on_gitbook
chat on_slack
codecov
pypi

Below is the documentation for Quilt 3. See here and here from Quilt 2.

Quilt is a versioned data portal for AWS

  • open.quiltdata.com is a petabyte-scale open data portal that runs on Quilt
  • quiltdata.com includes case studies, use cases, videos, and information on how you can run a private Quilt instance

Who is Quilt for?

Quilt is for data-driven teams of both technical
and non-technical members (executives, data scientists,
data engineers, sales, product, etc.).

What does Quilt do?

Quilt adds search, visual content preview, and
versioning to every file in S3.

How does Quilt work?

Quilt consists of a Python client, web catalog, lambda
functions—all of which are open source—plus
a suite of backend services and Docker containers
orchestrated by CloudFormation.
The latter are available under a paid license for
private use on quiltdata.com.

Use cases

Quilt addresses five key use cases:

  • Share data at scale. Quilt wraps AWS S3 to add simple URLs, web preview for large files, and sharing via email address (no need to create an IAM role).
  • Understand data better through inline documentation (Jupyter notebooks, markdown) and visualizations (Vega, Vega Lite)
  • Discover related data by indexing objects in ElasticSearch
  • Model data by providing a home for large data and models that don't fit in git, and by providing immutable versions for objects and data sets (a.k.a. "Quilt Packages")
  • Decide by broadening data access within the organization and supporting the documentation of decision processes through audit-able versioning and inline documentation

Roadmap

I - Performance and core services

  • Address performance issues with push (e.g. re-hash)
  • Investigate and implement more efficient manifest formats (e.g. Parquet),
    that scale to 10M keys; consider abbreviated "fast manifests" for lazy browsing
  • Refactor s3://bucket/.quilt for improved listing and delete performance
  • Provide Presto-DB-powered services for filtering package repos with SQL

II - CI/CD for data

  • Ability to fork/merge packages
  • Data quality monitoring

III - Storage agnostic (support Azure, GCP buckets)

  • Evaluate min.io and ceph.io as shims
  • Evaluate feasibility of on-prem local storage as a repo

IV - Cloud agnostic

  • Evaluate K8s and Terraform to replace CloudFormation
  • Shim lambdas (consider serverless.com)
  • Shim ElasticSearch (consider SOLR)
  • Shim IAM via RBAC

主要指標

概覽
名稱與所有者quiltdata/quilt
主編程語言TypeScript
編程語言Python (語言數: 10)
平台
許可證Apache License 2.0
所有者活动
創建於2017-02-10 02:46:03
推送於2025-06-13 07:49:59
最后一次提交2025-06-13 11:40:20
發布數113
最新版本名稱6.3.1 (發布於 )
第一版名稱v2.0.0-alpha (發布於 )
用户参与
星數1.3k
關注者數17
派生數91
提交數5k
已啟用問題?
問題數130
打開的問題數21
拉請求數3710
打開的拉請求數15
關閉的拉請求數560
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?