如何成为数据工程师

从零开始学习数据工程的实用资源清单。「A list of useful resources to learn Data Engineering from scratch」

Github stars Tracking Chart

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

  • Martin Kleppmann author of Designing Data-Intensive Application
  • BaseDS by Vaidehi Joshi about Distributed Systems

Tools

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark is a unified analytics engine for large-scale data processing
  • Apache Kafka is a distributed streaming platform
  • Luigi is a Python package that helps you build complex pipelines of batch jobs.
  • Dagster.io is a system for building modern data applications.
  • Prefect includes everything you need to create and run data applications.
  • Metaflow build and manage real-life data science projects with ease
  • lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

Main metrics

Overview
Name With Owneradilkhash/Data-Engineering-HowTo
Primary Language
Program language (Language Count: 0)
Platform
License:
所有者活动
Created At2019-03-28 07:43:26
Pushed At2024-06-19 08:49:58
Last Commit At2024-01-03 12:35:21
Release Count0
用户参与
Stargazers Count3.8k
Watchers Count102
Fork Count542
Commits Count56
Has Issues Enabled
Issues Count2
Issue Open Count0
Pull Requests Count13
Pull Requests Open Count8
Pull Requests Close Count1
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private