Analytics Zoo

在 Apache Spark/Flink 和 Ray 上的分布式 Tensorflow、Keras 和 PyTorch。「Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray」

Github stars Tracking Chart

Analytics Zoo

基于 Apache Spark/Flink 和 Ray 的分布式 TensorFlow、Keras 和 PyTorch 的统一数据分析和 AI 平台。

什么是 Analytics Zoo?

Analytics Zoo 将 TensorFlow、Keras 和 PyTorch 无缝扩展到分布式大数据中(使用 Spark、Flink 和 Ray)。 将人工智能模型(TensorFlow、PyTorch、OpenVINO 等)应用于分布式大数据的端到端管道。


将 AI 模型(TensorFlow、PyTorch、OpenVINO 等)应用于分布式大数据的端到端管道。

  • 用 Spark 代码内联编写 TensorFlowPyTorch,用于分布式训练和推理。
  • Spark ML Pipelines 中支持原生深度学习(TensorFlow/Keras/PyTorch/BigDL)。
  • 通过 RayOnSpark 直接在大数据集群上运行 Ray 程序。
  • 用于(TensorFlow/PyTorch/BigDL/OpenVINO)模型推理的普通 Java/Python API。

用于自动化机器学习任务的高级ML工作流。

  • 用于自动分布式(TensorFlow/PyTorch/Caffe/OpenVINO)模型推理的集群服务
  • 用于时间序列预测的可扩展 AutoML

内置模型用于推荐,时间序列,计算机视觉和NLP应用。

为什么使用Analytics Zoo?

如果你想使用 Analytics Zoo 来开发你的 AI 解决方案,你可能会希望:

  • 你想轻松地将 AI 模型(如 TensorFlow、Keras、PyTorch、BigDL、OpenVINO 等)应用于分布式大数据。
  • 您希望以 "零" 代码更改的方式透明地将您的 AI 应用从单台笔记本电脑扩展到大型集群。
  • 您希望将您的 AI 管道部署到现有的 YARN 或 K8S 集群上,而无需对集群进行任何修改。
  • 你想自动化应用机器学习的过程(如特征工程、超参数调整、模型选择、分布式推理等)。

如何使用Analytics Zoo?


(The first version translated by vz on 2020.10.25)

Main metrics

Overview
Name With Ownerintel/analytics-zoo
Primary LanguageJupyter Notebook
Program languageScala (Language Count: 10)
PlatformGoogle Cloud Platform, Kubernetes, Linux, Mac, Unix-like, Windows, Databricks
License:Apache License 2.0
所有者活动
Created At2024-03-05 03:41:26
Pushed At2025-01-09 01:09:27
Last Commit At2025-01-09 09:05:46
Release Count15
Last Release Namev0.11.2 (Posted on )
First Release Namev0.1.0 (Posted on )
用户参与
Stargazers Count27
Watchers Count7
Fork Count6
Commits Count3.5k
Has Issues Enabled
Issues Count1261
Issue Open Count406
Pull Requests Count34
Pull Requests Open Count0
Pull Requests Close Count1
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

A unified Data Analytics and AI platform for distributed TensorFlow, Keras, PyTorch, Apache Spark/Flink and Ray


What is Analytics Zoo?

Analytics Zoo provides a unified data analytics and AI platform that seamlessly unites TensorFlow, Keras, PyTorch, Spark, Flink and Ray programs into an integrated pipeline, which can transparently scale from a laptop to large clusters to process production big data.

  • Integrated Analytics and AI Pipelines for easily prototyping and deploying end-to-end AI applications.

    • Write TensorFlow or PyTorch inline with Spark code for distributed training and inference.
    • Native deep learning (TensorFlow/Keras/PyTorch/BigDL) support in Spark ML Pipelines.
    • Directly run Ray programs on big data cluster through RayOnSpark.
    • Plain Java/Python APIs for (TensorFlow/PyTorch/BigDL/OpenVINO) Model Inference.
  • High-Level ML Workflow that automates the process of building large-scale machine learning applications.

    • Automatically distributed Cluster Serving (for TensorFlow/PyTorch/Caffe/BigDL/OpenVINO models) with a simple pub/sub API.
    • Scalable AutoML for time series prediction (that automatically generates features, selects models and tunes hyperparameters).
  • Built-in Algorithms and Models for Recommendation, Time Series, Computer Vision and NLP applications.


Why use Analytics Zoo?

You may want to develop your AI solutions using Analytics Zoo if:

  • You want to easily prototype the entire end-to-end pipeline that applies AI models (e.g., TensorFlow, Keras, PyTorch, BigDL, OpenVINO, etc.) to production big data.
  • You want to transparently scale your AI applications from a laptop to large clusters with "zero" code changes.
  • You want to deploy your AI pipelines to existing YARN or K8S clusters WITHOUT any modifications to the clusters.
  • You want to automate the process of applying machine learning (such as feature engineering, hyperparameter tuning, model selection and distributed inference).

How to use Analytics Zoo?