如何成为数据工程师

从零开始学习数据工程的实用资源清单。「A list of useful resources to learn Data Engineering from scratch」

Github星跟蹤圖

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

  • Martin Kleppmann author of Designing Data-Intensive Application
  • BaseDS by Vaidehi Joshi about Distributed Systems

Tools

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark is a unified analytics engine for large-scale data processing
  • Apache Kafka is a distributed streaming platform
  • Luigi is a Python package that helps you build complex pipelines of batch jobs.
  • Dagster.io is a system for building modern data applications.
  • Prefect includes everything you need to create and run data applications.
  • Metaflow build and manage real-life data science projects with ease
  • lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

主要指標

概覽
名稱與所有者adilkhash/Data-Engineering-HowTo
主編程語言
編程語言 (語言數: 0)
平台
許可證
所有者活动
創建於2019-03-28 07:43:26
推送於2024-06-19 08:49:58
最后一次提交2024-01-03 12:35:21
發布數0
用户参与
星數3.8k
關注者數102
派生數542
提交數56
已啟用問題?
問題數2
打開的問題數0
拉請求數13
打開的拉請求數8
關閉的拉請求數1
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?