简介

DataHub 是 LinkedIn 的通用元数据搜索和发现工具。要了解有关 DataHub 的更多信息，请查看我们的 LinkedIn 博客文章和 Strata 演示文稿。您还应该访问 DataHub 体系结构以更好地了解如何实现 DataHub，以及访问DataHub 入门指南以了解如何针对自己的用例扩展 DataHub。

该存储库包含 DataHub 的前端和后端的完整源代码。你还可以了解我们如何同步内部 fork 和 GitHub 之间的变化。

快速开始

安装 docker 和 docker-compose（如果使用 Linux）。确保为 Docker 引擎分配足够的硬件资源。经过测试并确认的配置：2个CPU，8GB RAM，2GB交换区域。

从命令行或桌面应用程序打开 Docker，并确保它已启动并正在运行。

将此仓库克隆并复制到克隆存储库的根目录中。

运行以下命令以本地下载并运行所有 Docker 容器：

./docker/quickstart/quickstart.sh

此步骤第一次运行需要一段时间，并且可能很难从组合日志中判断 DataHub 是否已完全启动并正在运行。请使用本指南来验证每个容器是否正常运行。

此时，您应该可以通过在浏览器中打开 http://localhost:9001 来启动 DataHub。您可以使用 datahub 作为用户名和密码登录。但是，您会注意到尚未提取任何数据。

要将提供的样本数据摄取到 DataHub，请切换到新的终端窗口，使用 cd 进入克隆的 datahub 存储库，然后运行以下命令：

./docker/ingestion/ingestion.sh

运行此程序后，您应该能够在 DataHub 中查看和搜索示例数据集。

如果在快速入门过程中遇到任何问题，请参考调试指南。

名稱與所有者	datahub-project/datahub
主編程語言	Java
編程語言	Java (語言數: 17)
平台	Docker, Linux, Mac
許可證	Apache License 2.0

名稱與所有者

datahub-project/datahub

主編程語言

Java

編程語言

Java (語言數: 17)

平台

Docker, Linux, Mac

許可證

Apache License 2.0

創建於	2015-11-18 05:47:40
推送於	2025-04-22 15:07:56
最后一次提交	2025-04-22 20:37:55
發布數	111
最新版本名稱	v1.0.0 (發布於 )
第一版名稱	v0.1.0-alpha (發布於 )

創建於

2015-11-18 05:47:40

推送於

2025-04-22 15:07:56

最后一次提交

2025-04-22 20:37:55

發布數

111

星數	10.5k
關注者數	255
派生數	3.1k
提交數	11.7k
已啟用問題?
問題數	2370
打開的問題數	273
拉請求數	9473
打開的拉請求數	194
關閉的拉請求數	1227

已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?

DataHub: A Generalized Metadata Search & Discovery Tool

:sparkles: Feb 2020 Update: DataHub v0.3.0 just released!

Introduction

DataHub is LinkedIn's generalized metadata search & discovery tool. To learn more about DataHub, check out our
LinkedIn blog post and Strata presentation.
You should also visit DataHub Architecture to get a better understanding of how DataHub is implemented and DataHub Onboarding Guide to understand how to extend DataHub for your own use case.

This repository contains the complete source code for both DataHub's frontend & backend.

Quickstart

Install docker and docker-compose. Make sure to configure Docker to allocate enough hardware resources for Docker engine. Tested & confirmed config: 4 CPUs, 8GB RAM, 2GB Swap area.
Open Docker either from the command line or the Desktop app and ensure it is up and running.
Clone this repo and cd into the root directory for the cloned repository.
Run below command to download and run all Docker containers in your local:
```
cd docker/quickstart && docker-compose pull && docker-compose up --build
```
This step takes long time and it might be hard to figure out when DataHub is fully up. You can refer to this guide to verify if DataHub is up and running.
At this point, you should be able to start DataHub by opening http://localhost:9001 in your browser. You can sign in using datahub as both username and password. However, there is no data just yet.
To ingest provided sample data to DataHub, switch to a new terminal, cd into the cloned datahub repo, and run below command:
```
docker build -t ingestion -f docker/ingestion/Dockerfile . && cd docker/ingestion && docker-compose up
```
After running this, you should be able to see sample data in DataHub.

Refer to debugging guide if you have issues in any of the above steps.

Quicklinks

Releases

See Releases page for more details.

Contributing

We welcome contributions from the community. Please refer to the guidelines for more details. We also have a contrib directory for incubation.

Roadmap

Check out DataHub's roadmap.

DataHub

Github星跟蹤圖

DataHub：通用元数据搜索和发现工具。

简介

快速开始

文档

发布

常见问题

特性和路线图

贡献

社区

相关文章及演讲

主要指標

DataHub: A Generalized Metadata Search & Discovery Tool

Introduction

Quickstart

Quicklinks

Releases

Contributing

Roadmap

DataHub

Github星跟蹤圖

DataHub：通用元数据搜索和发现工具。

简介

#快速开始

文档

发布

常见问题

特性和路线图

贡献

#社区

相关文章及演讲

主要指標

DataHub: A Generalized Metadata Search & Discovery Tool

Introduction

Quickstart

Quicklinks

Releases

Contributing

Roadmap

快速开始

社区