Great Expectations
Always know what to expect from your data.
Important announcements regarding our upcoming 1.0 release
We’re planning a ton of work to take GX OSS to the next level as we officially graduate it to 1.0!
Our biggest goal is to improve the user and contributor experiences by streamlining the API, based on the feedback
we’ve
gotten from the community (thank you!) over the years.
Learn more about our plans for 1.0 and how we’ll be making this transition in
our blog post.
As we gear up for the launch of our 1.0 release early next year, we want to share an important update regarding our
current development process.
Temporary hold on PRs
We’re temporarily pausing the acceptance of new pull requests (PRs). We’re going to be updating the API and codebase
frequently and significantly over the next few months—we don’t want contributors to spend time and effort only to find
that we’ve just implemented a breaking change for their work.
Looking forward
We deeply value the contributions and engagement of our community. Please hold onto your fantastic ideas and PRs until
after the 1.0 release, when we will be excited to resume accepting them. We appreciate your understanding and support
as we make this final push toward this exciting milestone. Please watch for updates in our
slack community, and thank you for being a crucial part of our journey!
What is GX?
Great Expectations (GX) helps data teams build a shared understanding of their data through quality testing,
documentation, and profiling.
Data practitioners know that testing and documentation are essential for managing complex data pipelines. GX makes it
possible for data science and engineering teams to quickly deploy extensible, flexible data quality testing into their
data stacks. Its human-readable documentation makes the results accessible to technical and nontechnical users.
See Down with Pipeline Debt!
for an introduction to our philosophy of pipeline data quality testing.
Key features
Seamless operation
GX fits into your existing tech stack, and can integrate with your CI/CD pipelines to add data quality exactly where you
need it. Connect to and validate your data wherever it already is, so you can focus on honing your Expectation Suites to
perfectly meet your data quality needs.
Start fast
Get useful results quickly even for large data volumes. GX’s Data Assistants provide curated Expectations for different
domains, so you can accelerate your data discovery to rapidly deploy data quality throughout your pipelines.
Auto-generated Data Docs ensure your DQ documentation will always be up-to-date.
Unified understanding
Expectations are GX’s workhorse abstraction: each Expectation declares an expected state of the data. The Expectation
library provides a flexible, extensible vocabulary for data quality—one that’s human-readable, meaningful for technical
and nontechnical users alike. Bundled into Expectation Suites, Expectations are the ideal tool for characterizing
exactly what you expect from your data.
expect_column_values_to_not_be_null
expect_column_values_to_match_regex
expect_column_values_to_be_unique
expect_column_values_to_match_strftime_format
expect_table_row_count_to_be_between
expect_column_median_to_be_between
- ...and many more
Secure and transparent
GX doesn’t ask you to exchange security for your insight. It processes your data in place, on your systems, so your
security and governance procedures can maintain control at all times. And because GX’s core is and always will be open
source, its complete transparency is the opposite of a black box.
Data contracts support
Checkpoints are a transparent, central, and automatable mechanism for testing Expectations and evaluating your data
quality. Every Checkpoint run produces human-readable Data Docs reporting the results. You can also configure
Checkpoints to take Actions based on the results of the evaluation, like sending alerts and preventing low-quality data
from moving further in your pipelines.
Readable for collaboration
Everyone stays on the same page about your data quality with GX’s inspectable, shareable, and human-readable Data Docs.
You can publish Data Docs to the locations where you need them in a variety of formats, making it easy to integrate Data
Docs into your existing data catalogs, dashboards, and other reporting and data governance tools.
Quick start
To see Great Expectations in action on your own data:
You can install it using pip
pip install great_expectations
and then run
import great_expectations as gx
context = gx.get_context()
(We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks,
or git, you may want to check out
the Supporting Resources, which will teach you how
to get up and running in minutes.)
For full documentation, visit https://docs.greatexpectations.io/.
If you need help, hop into our Slack channel—there are always contributors
and other users there.
Integrations
Great Expectations works with the tools and systems that you're already using with your data, including:
What is GX not?
Great Expectations is not a pipeline execution framework. Instead, it integrates seamlessly with DAG execution tools
like Spark, Airflow, dbt
, prefect, dagster
, Kedro, Flyte, etc. GX carries out your data quality
pipeline testing while these tools execute the pipelines.
Great Expectations is not a database or storage software. It processes your data in place, on your existing systems.
Expectations and Validation Results that GX produces are metadata about your data.
Great Expectations is not a data versioning tool. If you want to bring your data itself under version control, check
out tools like DVC, Quilt,
and lakeFS.
Great Expectations is not a language-agnostic platform. Instead, it follows the philosophy of “take the compute to the
data” by using the popular Python language to support native execution of Expectations in pandas, SQL (via SQLAlchemy),
and Spark environments.
Great Expectations is not exclusive to Python programming environments. It can be invoked from the command line
without a Python environment. However, if you’re working into another ecosystem, you may want to explore
ecosystem-specific alternatives such as assertR (for R environments)
or TFDV (for Tensorflow environments).
Who maintains Great Expectations?
Great Expectations OSS is under active development by GX Labs and the Great Expectations community.
What's the best way to get in touch with the Great Expectations team?
If you have questions, comments, or just want to have a good old-fashioned chat about data quality, please hop on our
public Slack channel or post in
our Discourse.
Can I contribute to the library?
Absolutely. Yes, please.
See Contributing code
, Contributing Expectations
, Contributing packages
,
or Contribute to Great Expectations documentation
, and please don't be shy with questions.
How do I stay up to date with Great Expectations?
You can get updates on everything GX with our email
newsletter. Subscribe here!