Great Expectations

始终知道对你的数据有什么期待值。Great Expectations 通过数据测试、文档和分析,帮助数据团队消除管道债务。「Always know what to expect from your data. Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.」

Github stars Tracking Chart

Python Versions
PyPI
PyPI Downloads
Build Status
pre-commit.ci Status
DOI
Twitter Follow
Slack Status
Contributors
Ruff

Great Expectations

Always know what to expect from your data.

Important announcements regarding our upcoming 1.0 release

We’re planning a ton of work to take GX OSS to the next level as we officially graduate it to 1.0!

Our biggest goal is to improve the user and contributor experiences by streamlining the API, based on the feedback
we’ve
gotten from the community (thank you!) over the years.

Learn more about our plans for 1.0 and how we’ll be making this transition in
our blog post.

As we gear up for the launch of our 1.0 release early next year, we want to share an important update regarding our
current development process.

Temporary hold on PRs

We’re temporarily pausing the acceptance of new pull requests (PRs). We’re going to be updating the API and codebase
frequently and significantly over the next few months—we don’t want contributors to spend time and effort only to find
that we’ve just implemented a breaking change for their work.

Looking forward

We deeply value the contributions and engagement of our community. Please hold onto your fantastic ideas and PRs until
after the 1.0 release, when we will be excited to resume accepting them. We appreciate your understanding and support
as we make this final push toward this exciting milestone. Please watch for updates in our
slack community, and thank you for being a crucial part of our journey!

What is GX?

Great Expectations (GX) helps data teams build a shared understanding of their data through quality testing,
documentation, and profiling.

Data practitioners know that testing and documentation are essential for managing complex data pipelines. GX makes it
possible for data science and engineering teams to quickly deploy extensible, flexible data quality testing into their
data stacks. Its human-readable documentation makes the results accessible to technical and nontechnical users.

See Down with Pipeline Debt!
for an introduction to our philosophy of pipeline data quality testing.

Key features

Seamless operation

GX fits into your existing tech stack, and can integrate with your CI/CD pipelines to add data quality exactly where you
need it. Connect to and validate your data wherever it already is, so you can focus on honing your Expectation Suites to
perfectly meet your data quality needs.

Start fast

Get useful results quickly even for large data volumes. GX’s Data Assistants provide curated Expectations for different
domains, so you can accelerate your data discovery to rapidly deploy data quality throughout your pipelines.
Auto-generated Data Docs ensure your DQ documentation will always be up-to-date.

data_assistant_plot_expectations_and_metrics

Unified understanding

Expectations are GX’s workhorse abstraction: each Expectation declares an expected state of the data. The Expectation
library provides a flexible, extensible vocabulary for data quality—one that’s human-readable, meaningful for technical
and nontechnical users alike. Bundled into Expectation Suites, Expectations are the ideal tool for characterizing
exactly what you expect from your data.

  • expect_column_values_to_not_be_null
  • expect_column_values_to_match_regex
  • expect_column_values_to_be_unique
  • expect_column_values_to_match_strftime_format
  • expect_table_row_count_to_be_between
  • expect_column_median_to_be_between
  • ...and many more

Secure and transparent

GX doesn’t ask you to exchange security for your insight. It processes your data in place, on your systems, so your
security and governance procedures can maintain control at all times. And because GX’s core is and always will be open
source, its complete transparency is the opposite of a black box.

Data contracts support

Checkpoints are a transparent, central, and automatable mechanism for testing Expectations and evaluating your data
quality. Every Checkpoint run produces human-readable Data Docs reporting the results. You can also configure
Checkpoints to take Actions based on the results of the evaluation, like sending alerts and preventing low-quality data
from moving further in your pipelines.

Image of data contact support

Readable for collaboration

Everyone stays on the same page about your data quality with GX’s inspectable, shareable, and human-readable Data Docs.
You can publish Data Docs to the locations where you need them in a variety of formats, making it easy to integrate Data
Docs into your existing data catalogs, dashboards, and other reporting and data governance tools.

Image of data docs

Quick start

To see Great Expectations in action on your own data:

You can install it using pip

pip install great_expectations

and then run

import great_expectations as gx

context = gx.get_context()

(We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks,
or git, you may want to check out
the Supporting Resources, which will teach you how
to get up and running in minutes.)

For full documentation, visit https://docs.greatexpectations.io/.

If you need help, hop into our Slack channel—there are always contributors
and other users there.

Integrations

Great Expectations works with the tools and systems that you're already using with your data, including:

What is GX not?

Great Expectations is not a pipeline execution framework. Instead, it integrates seamlessly with DAG execution tools
like Spark, Airflow, dbt
, prefect, dagster
, Kedro, Flyte, etc. GX carries out your data quality
pipeline testing while these tools execute the pipelines.

Great Expectations is not a database or storage software. It processes your data in place, on your existing systems.
Expectations and Validation Results that GX produces are metadata about your data.

Great Expectations is not a data versioning tool. If you want to bring your data itself under version control, check
out tools like DVC, Quilt,
and lakeFS.

Great Expectations is not a language-agnostic platform. Instead, it follows the philosophy of “take the compute to the
data” by using the popular Python language to support native execution of Expectations in pandas, SQL (via SQLAlchemy),
and Spark environments.

Great Expectations is not exclusive to Python programming environments. It can be invoked from the command line
without a Python environment. However, if you’re working into another ecosystem, you may want to explore
ecosystem-specific alternatives such as assertR (for R environments)
or TFDV (for Tensorflow environments).

Who maintains Great Expectations?

Great Expectations OSS is under active development by GX Labs and the Great Expectations community.

What's the best way to get in touch with the Great Expectations team?

If you have questions, comments, or just want to have a good old-fashioned chat about data quality, please hop on our
public Slack channel or post in
our Discourse.

Can I contribute to the library?

Absolutely. Yes, please.
See Contributing code
, Contributing Expectations
, Contributing packages
,
or Contribute to Great Expectations documentation
, and please don't be shy with questions.

How do I stay up to date with Great Expectations?

You can get updates on everything GX with our email
newsletter. Subscribe here!

Main metrics

Overview
Name With Ownergreat-expectations/great_expectations
Primary LanguagePython
Program languagePython (Language Count: 6)
PlatformDatabricks, Linux, Mac, Windows
License:Apache License 2.0
所有者活动
Created At2017-09-11 00:18:46
Pushed At2025-06-06 21:57:02
Last Commit At
Release Count337
Last Release Name1.5.0 (Posted on 2025-06-05 08:57:13)
First Release Name0.1 (Posted on )
用户参与
Stargazers Count10.5k
Watchers Count83
Fork Count1.6k
Commits Count13.2k
Has Issues Enabled
Issues Count2008
Issue Open Count50
Pull Requests Count7728
Pull Requests Open Count19
Pull Requests Close Count1293
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private