如何成为数据工程师

从零开始学习数据工程的实用资源清单。「A list of useful resources to learn Data Engineering from scratch」

Github星跟踪图

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

  • Martin Kleppmann author of Designing Data-Intensive Application
  • BaseDS by Vaidehi Joshi about Distributed Systems

Tools

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark is a unified analytics engine for large-scale data processing
  • Apache Kafka is a distributed streaming platform
  • Luigi is a Python package that helps you build complex pipelines of batch jobs.
  • Dagster.io is a system for building modern data applications.
  • Prefect includes everything you need to create and run data applications.
  • Metaflow build and manage real-life data science projects with ease
  • lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

主要指标

概览
名称与所有者adilkhash/Data-Engineering-HowTo
主编程语言
编程语言 (语言数: 0)
平台
许可证
所有者活动
创建于2019-03-28 07:43:26
推送于2024-06-19 08:49:58
最后一次提交2024-01-03 12:35:21
发布数0
用户参与
星数3.8k
关注者数102
派生数542
提交数56
已启用问题?
问题数2
打开的问题数0
拉请求数13
打开的拉请求数8
关闭的拉请求数1
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?