Airbyte

从 API、数据库和文件到数据仓库和湖的 ELT 管道的数据集成平台。「Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.」

Github stars Tracking Chart

Airbyte简介

从 API、数据库和文件到数据库、数据仓库和湖的 ELT 管道的数据集成平台。

作者相信,只有开源的数据移动解决方案才能覆盖长尾的数据源,同时使数据工程师能够定制现有的连接器。我们的最终愿景是帮助你将数据从任何来源转移到任何目的地。Airbyte 已经为流行的 API、数据库、数据仓库和数据湖提供了 300 多个连接器。

你可以用任何语言实现 Airbyte 连接器,并采取遵循 Airbyte 规范 的 Docker 镜像的形式。你可以通过以下方式非常快速地创建新的连接器:

Airbyte 有一个内置的调度器,并使用 Temporal 来协调工作,确保规模的可靠性。Airbyte 利用 dbt 来规范化提取的数据,并可以在SQL和dbt中触发自定义转换。你也可以用 AirflowPrefect or Dagster 来协调 Airbyte 的同步工作。

探索我们的 demo app

快速启动

git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up

现在访问 http://localhost:8000

这里有一个 分步指南,告诉你如何从API中加载数据到文件中,这一切都在你的电脑上。

如果你想和我们的团队安排一次20分钟的通话,帮助你进行设置,请 直接在这里选择一些时间

特性

  • 为可扩展性而建。适应现有的连接器,以满足你的需求,或轻松地建立一个新的连接器。
  • 可选的规范化模式。完全可定制,从原始数据开始或从一些规范化数据的建议开始。
  • 全面的调度器。以你需要的频率自动复制你的数据。
  • 实时监控。我们全面详细地记录所有的错误,以帮助你了解。
  • 增量更新。自动复制是基于增量更新,以减少你的数据传输成本。
  • 手动全面刷新。有时,你需要重新同步你的所有数据,重新开始。
  • 调试自主性。在你认为合适的时候修改和调试管道,无需等待。

在我们的网站上看到更多信息

贡献

我们喜欢对 Airbyte 的贡献,无论大小。

请参阅我们的 贡献指南,了解如何开始。不确定从哪里开始?我们已经列出了一些可以开始的 好的首发问题。如果你有任何问题,请打开一个PR草案或访问我们的 slack 频道,核心团队可以帮助回答你的问题。
请注意,你能够使用你想要的语言创建连接器,因为Airbyte 连接以 Docker 容器的形式运行。
另外,我们永远不会要求你维护你的连接器。我们的目标是让 Airbyte 团队和社区帮助维护它,让我们称之为众包式维护

社区支持

关于使用 Airbyte 的一般帮助,请参考 Airbyte官 方文档。对于额外的帮助,你可以使用这些频道之一来提问。

  • Slack (用于与社区和 Airbyte 团队进行实时讨论)
  • GitHub (错误报告,贡献)
  • Twitter (快速获取新闻)
  • 每周办公时间 (与 Airbyte 团队进行30分钟的现场非正式视频电话会议)

路线图

查看我们的 路线图,了解我们目前正在进行的工作,以及我们对未来几周、几个月和几年的设想。

许可证

Airbyte 在 MIT 许可下授权。请参阅 LICENSE 文件了解许可信息。

 

 

Overview

Name With Ownerairbytehq/airbyte
Primary LanguagePython
Program languageShell (Language Count: 15)
PlatformDocker, Linux, Mac, Windows Subsystem for Linux (WSL)
License:Other
Release Count458
Last Release Namev0.55.1 (Posted on )
First Release Namev0.1.0-alpha (Posted on 2020-09-23 22:27:37)
Created At2020-07-27 23:55:54
Pushed At2024-03-23 22:21:20
Last Commit At2024-03-23 14:19:08
Stargazers Count13.6k
Watchers Count174
Fork Count3.5k
Commits Count15k
Has Issues Enabled
Issues Count13467
Issue Open Count1567
Pull Requests Count14509
Pull Requests Open Count614
Pull Requests Close Count4140
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

We believe that only an open-source solution to data movement can cover the long tail of data sources while empowering data engineers to customize existing connectors. Our ultimate vision is to help you move data from any source to any destination. Airbyte already provides 300+ connectors for popular APIs, databases, data warehouses and data lakes.

You can implement Airbyte connectors in any language and take the form of a Docker image that follows the Airbyte specification. You can create new connectors very fast with:

Airbyte has a built-in scheduler and uses Temporal to orchestrate jobs and ensure reliability at scale. Airbyte leverages dbt to normalize extracted data and can trigger custom transformations in SQL and dbt. You can also orchestrate Airbyte syncs with Airflow, Prefect or Dagster.

Airbyte OSS Connections UI

Explore our demo app.

Quick start

Run Airbyte locally

You can run Airbyte locally with Docker. The shell script below will retrieve the requisite docker files from the platform repository and run docker compose for you.

git clone --depth 1 https://github.com/airbytehq/airbyte.git
cd airbyte
./run-ab-platform.sh

Login to the web app at http://localhost:8000 by entering the default credentials found in your .env file.

BASIC_AUTH_USERNAME=airbyte
BASIC_AUTH_PASSWORD=password

Follow web app UI instructions to set up a source, destination, and connection to replicate data. Connections support the most popular sync modes: full refresh, incremental and change data capture for databases.

Read the Airbyte docs.

Manage Airbyte configurations with code

You can also programmatically manage sources, destinations, and connections with YAML files, Octavia CLI, and API.

Deploy Airbyte to production

Deployment options: Docker, AWS EC2, Azure, GCP, Kubernetes, Restack, Plural, Oracle Cloud, Digital Ocean...

Use Airbyte Cloud

Airbyte Cloud is the fastest and most reliable way to run Airbyte. It is a cloud-based data integration platform that allows you to collect and consolidate data from various sources into a single, unified system. It provides a user-friendly interface for data integration, transformation, and migration.

With Airbyte Cloud, you can easily connect to various data sources such as databases, APIs, and SaaS applications. It also supports a wide range of popular data sources like Salesforce, Stripe, Hubspot, PostgreSQL, and MySQL, among others.

Airbyte Cloud provides a scalable and secure platform for data integration, making it easier for users to move, transform, and replicate data across different applications and systems. It also offers features like monitoring, alerting, and scheduling to ensure data quality and reliability.

Sign up for Airbyte Cloud and get free credits in minutes.

Contributing

Get started by checking Github issues and creating a Pull Request. An easy way to start contributing is to update an existing connector or create a new connector using the low-code and Python CDKs. You can find the code for existing connectors in the connectors directory. The Airbyte platform is written in Java, and the frontend in React. You can also contribute to our docs and tutorials. Advanced Airbyte users can apply to the Maintainer program and Writer Program.

If you would like to make a contribution to the platform itself, please refer to guides in the platform repository

Read the Contributing guide.

Reporting vulnerabilities

⚠️ Please do not file GitHub issues or post on our public forum for security vulnerabilities as they are public! ⚠️

Airbyte takes security issues very seriously. If you have any concerns about Airbyte or believe you have uncovered a vulnerability, please get in touch via the e-mail address security@airbyte.io. In the message, try to provide a description of the issue and ideally a way of reproducing it. The security team will get back to you as soon as possible.

Note that this security address should be used only for undisclosed vulnerabilities. Dealing with fixed issues or general questions on how to use the security features should be handled regularly via the user and the dev lists. Please report any security problems to us before disclosing it publicly.

License

See the LICENSE file for licensing information, and our FAQ for any questions you may have on that topic.

Resources

  • Weekly office hours for live informal sessions with the Airbyte team
  • Slack for quick discussion with the Community and Airbyte team
  • Discourse for deeper conversations about features, connectors, and problems
  • GitHub for code, issues and pull requests
  • Youtube for videos on data engineering
  • Newsletter for product updates and data news
  • Blog for data insigts articles, tutorials and updates
  • Docs for Airbyte features
  • Roadmap for planned features
To the top