Great Expectations

始终知道对你的数据有什么期待值。Great Expectations 通过数据测试、文档和分析,帮助数据团队消除管道债务。「Always know what to expect from your data. Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.」

Github星跟蹤圖

Python Versions
PyPI
PyPI Downloads
Build Status
pre-commit.ci Status
DOI
Twitter Follow
Slack Status
Contributors
Ruff

Great Expectations

Always know what to expect from your data.

Important announcements regarding our upcoming 1.0 release

We’re planning a ton of work to take GX OSS to the next level as we officially graduate it to 1.0!

Our biggest goal is to improve the user and contributor experiences by streamlining the API, based on the feedback
we’ve
gotten from the community (thank you!) over the years.

Learn more about our plans for 1.0 and how we’ll be making this transition in
our blog post.

As we gear up for the launch of our 1.0 release early next year, we want to share an important update regarding our
current development process.

Temporary hold on PRs

We’re temporarily pausing the acceptance of new pull requests (PRs). We’re going to be updating the API and codebase
frequently and significantly over the next few months—we don’t want contributors to spend time and effort only to find
that we’ve just implemented a breaking change for their work.

Looking forward

We deeply value the contributions and engagement of our community. Please hold onto your fantastic ideas and PRs until
after the 1.0 release, when we will be excited to resume accepting them. We appreciate your understanding and support
as we make this final push toward this exciting milestone. Please watch for updates in our
slack community, and thank you for being a crucial part of our journey!

What is GX?

Great Expectations (GX) helps data teams build a shared understanding of their data through quality testing,
documentation, and profiling.

Data practitioners know that testing and documentation are essential for managing complex data pipelines. GX makes it
possible for data science and engineering teams to quickly deploy extensible, flexible data quality testing into their
data stacks. Its human-readable documentation makes the results accessible to technical and nontechnical users.

See Down with Pipeline Debt!
for an introduction to our philosophy of pipeline data quality testing.

Key features

Seamless operation

GX fits into your existing tech stack, and can integrate with your CI/CD pipelines to add data quality exactly where you
need it. Connect to and validate your data wherever it already is, so you can focus on honing your Expectation Suites to
perfectly meet your data quality needs.

Start fast

Get useful results quickly even for large data volumes. GX’s Data Assistants provide curated Expectations for different
domains, so you can accelerate your data discovery to rapidly deploy data quality throughout your pipelines.
Auto-generated Data Docs ensure your DQ documentation will always be up-to-date.

data_assistant_plot_expectations_and_metrics

Unified understanding

Expectations are GX’s workhorse abstraction: each Expectation declares an expected state of the data. The Expectation
library provides a flexible, extensible vocabulary for data quality—one that’s human-readable, meaningful for technical
and nontechnical users alike. Bundled into Expectation Suites, Expectations are the ideal tool for characterizing
exactly what you expect from your data.

  • expect_column_values_to_not_be_null
  • expect_column_values_to_match_regex
  • expect_column_values_to_be_unique
  • expect_column_values_to_match_strftime_format
  • expect_table_row_count_to_be_between
  • expect_column_median_to_be_between
  • ...and many more

Secure and transparent

GX doesn’t ask you to exchange security for your insight. It processes your data in place, on your systems, so your
security and governance procedures can maintain control at all times. And because GX’s core is and always will be open
source, its complete transparency is the opposite of a black box.

Data contracts support

Checkpoints are a transparent, central, and automatable mechanism for testing Expectations and evaluating your data
quality. Every Checkpoint run produces human-readable Data Docs reporting the results. You can also configure
Checkpoints to take Actions based on the results of the evaluation, like sending alerts and preventing low-quality data
from moving further in your pipelines.

Image of data contact support

Readable for collaboration

Everyone stays on the same page about your data quality with GX’s inspectable, shareable, and human-readable Data Docs.
You can publish Data Docs to the locations where you need them in a variety of formats, making it easy to integrate Data
Docs into your existing data catalogs, dashboards, and other reporting and data governance tools.

Image of data docs

Quick start

To see Great Expectations in action on your own data:

You can install it using pip

pip install great_expectations

and then run

import great_expectations as gx

context = gx.get_context()

(We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks,
or git, you may want to check out
the Supporting Resources, which will teach you how
to get up and running in minutes.)

For full documentation, visit https://docs.greatexpectations.io/.

If you need help, hop into our Slack channel—there are always contributors
and other users there.

Integrations

Great Expectations works with the tools and systems that you're already using with your data, including:

What is GX not?

Great Expectations is not a pipeline execution framework. Instead, it integrates seamlessly with DAG execution tools
like Spark, Airflow, dbt
, prefect, dagster
, Kedro, Flyte, etc. GX carries out your data quality
pipeline testing while these tools execute the pipelines.

Great Expectations is not a database or storage software. It processes your data in place, on your existing systems.
Expectations and Validation Results that GX produces are metadata about your data.

Great Expectations is not a data versioning tool. If you want to bring your data itself under version control, check
out tools like DVC, Quilt,
and lakeFS.

Great Expectations is not a language-agnostic platform. Instead, it follows the philosophy of “take the compute to the
data” by using the popular Python language to support native execution of Expectations in pandas, SQL (via SQLAlchemy),
and Spark environments.

Great Expectations is not exclusive to Python programming environments. It can be invoked from the command line
without a Python environment. However, if you’re working into another ecosystem, you may want to explore
ecosystem-specific alternatives such as assertR (for R environments)
or TFDV (for Tensorflow environments).

Who maintains Great Expectations?

Great Expectations OSS is under active development by GX Labs and the Great Expectations community.

What's the best way to get in touch with the Great Expectations team?

If you have questions, comments, or just want to have a good old-fashioned chat about data quality, please hop on our
public Slack channel or post in
our Discourse.

Can I contribute to the library?

Absolutely. Yes, please.
See Contributing code
, Contributing Expectations
, Contributing packages
,
or Contribute to Great Expectations documentation
, and please don't be shy with questions.

How do I stay up to date with Great Expectations?

You can get updates on everything GX with our email
newsletter. Subscribe here!

主要指標

概覽
名稱與所有者great-expectations/great_expectations
主編程語言Python
編程語言Python (語言數: 6)
平台Databricks, Linux, Mac, Windows
許可證Apache License 2.0
所有者活动
創建於2017-09-11 00:18:46
推送於2025-06-06 21:57:02
最后一次提交
發布數337
最新版本名稱1.5.0 (發布於 2025-06-05 08:57:13)
第一版名稱0.1 (發布於 )
用户参与
星數10.5k
關注者數83
派生數1.6k
提交數13.2k
已啟用問題?
問題數2008
打開的問題數50
拉請求數7728
打開的拉請求數19
關閉的拉請求數1293
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?