Cortex：为 Prometheus 提供水平可扩展、高可用、多租户、长期存储

Cortex 为 Prometheus 提供水平可扩展、高可用、多租户、长期存储。

水平可扩展。Cortex可以在集群中的多台机器上运行，超过单台机器的吞吐量和存储量。这使您能够将多个Prometheus服务器的指标发送到一个Cortex集群，并在一个地方运行所有数据的 "全球聚合 "查询。

高度的可用性。当在集群中运行时，Cortex 可以在机器之间复制数据。这使您可以在机器故障时幸存下来，而不会在您的图表中出现空白。
多租户：Cortex 可以隔离数据和查询。Cortex 可以将多个不同的独立Prometheus 源的数据和查询隔离在一个集群中，允许不受信任的各方共享同一个集群。
长期存储。Cortex 支持亚马逊 DynamoDB、谷歌 Bigtable、Cassandra、S3、GCS 和微软 Azure，用于长期存储度量数据。这使您可以持久地存储数据，时间超过任何一台机器的使用寿命，并使用这些数据进行长期的容量规划。

Cortex 是 CNCF 的一个孵化项目，用于多个生产系统，包括 Weave Cloud 和 Grafana Cloud。Cortex 主要是作为 Prometheus 的远程写入目标，有一个兼容 Prometheus 的查询 API。

文档

如果你是项目新手，请阅读入门指南。在使用永久存储后端部署 Cortex 之前，您应该阅读：

有关向 Cortex 贡献的指南，请参见贡献者指南。

进一步阅读

要了解更多关于 Cortex 的信息，请查阅以下文档和讲座。

Jan 2020 Blog Post "The Future of Cortex: Into the Next Decade"
Nov 2019 KubeCon talks "Cortex 101: Horizontally Scalable Long Term Storage for Prometheus" (video, slides), "Configuring Cortex for Max Performance " (video, slides, write up) and "Blazin’ Fast PromQL" (slides, video, write up)
Nov 2019 PromCon talk "Two Households, Both Alike in Dignity: Cortex and Thanos" (video, slides, write up)
May 2019 KubeCon talks; "Cortex: Intro" (video, slides, blog post) and "Cortex: Deep Dive" (video, slides)
Feb 2019 blog post & podcast; "Prometheus Scalability with Bryan Boreham" (podcast)
Feb 2019 blog post; "How Aspen Mesh Runs Cortex in Production"
Dec 2018 KubeCon talk; "Cortex: Infinitely Scalable Prometheus" (video, slides)
Dec 2018 CNCF blog post; "Cortex: a multi-tenant, horizontally scalable Prometheus-as-a-Service"
Nov 2018 CloudNative London meetup talk; "Cortex: Horizontally Scalable, Highly Available Prometheus" (slides)
Nov 2018 CNCF TOC Presentation; "Horizontally Scalable, Multi-tenant Prometheus" (slides)
Sept 2018 blog post; "What is Cortex?"
Aug 2018 PromCon panel; "Prometheus Long-Term Storage Approaches" (video)
Jul 2018 design doc; "Cortex Query Optimisations"
Aug 2017 PromCon talk; "Cortex: Prometheus as a Service, One Year On" (videos, slides, write up part 1, part 2, part 3)
Jun 2017 Prometheus London meetup talk; "Cortex: open-source, horizontally-scalable, distributed Prometheus" (video)
Dec 2016 KubeCon talk; "Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service" (video, slides)
Aug 2016 PromCon talk; "Project Frankenstein: Multitenant, Scale-Out Prometheus": (video, slides)
Jun 2016 design document; "Project Frankenstein: A Multi Tenant, Scale Out Prometheus"

获得帮助

如果你有任何关于 Cortex 的问题。

请在 Cortex Slack 频道提问。要邀请自己加入 CNCF Slack，请访问 http://slack.cncf.io/。
提交问题。
发送电子邮件到 cortex-users@lists.cncf.io

我们随时欢迎您的反馈。

有关安全问题，请访问 https://github.com/cortexproject/cortex/security/policy

社区会议

Cortex 社区电话每三周在 UTC 时间周四下午 03:30–04:15 进行一次，获得日历邀请加入 google groups

会议记录在这里。

托管的 Cortex (Prometheus as a service)

有几个商业服务可以按需使用 Cortex。

Weave Cloud

Weaveworks 的 Weave Cloud 可以让你部署、管理和监控基于容器的应用程序。请在 https://cloud.weave.works 注册，并按照那里的说明操作。其他帮助也可以在 Weave Cloud 文档中找到。

检测应用程序：最佳实践

Grafana 云

要使用 Cortex 作为 Grafana 云的一部分，请点击右上方的 "登录"，然后点击 "立即注册" 来注册 Grafana 云。Cortex 是作为 Grafana Starter 和 Basic Hosted Grafana 计划的一部分。

（The first version translated by vz on 2020.10.07）

Name With Owner	cortexlabs/cortex
Primary Language	Go
Program language	Shell (Language Count: 6)
Platform
License:	Apache License 2.0

Created At	2019-01-24 04:43:14
Pushed At	2024-06-12 19:34:23
Last Commit At	2023-03-03 21:19:44
Release Count	63
Last Release Name	v0.42.1 (Posted on )
First Release Name	v0.1.0 (Posted on )

Stargazers Count	8k
Watchers Count	142
Fork Count	606
Commits Count	2.3k
Has Issues Enabled
Issues Count	1101
Issue Open Count	114
Pull Requests Count	1300
Pull Requests Open Count	17
Pull Requests Close Count	46

Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

Deploy machine learning models in production

Cortex is an open source platform for deploying machine learning models as production web services.

install • tutorial • docs • examples • we're hiring • email us • chat with us

Demo

Key features

Multi framework: Cortex supports TensorFlow, PyTorch, scikit-learn, XGBoost, and more.
Autoscaling: Cortex automatically scales APIs to handle production workloads.
CPU / GPU support: Cortex can run inference on CPU or GPU infrastructure.
Spot instances: Cortex supports EC2 spot instances.
Rolling updates: Cortex updates deployed APIs without any downtime.
Log streaming: Cortex streams logs from deployed models to your CLI.
Prediction monitoring: Cortex monitors network metrics and tracks predictions.
Minimal configuration: Cortex deployments are defined in a single cortex.yaml file.

Spinning up a cluster

Cortex is designed to be self-hosted on any AWS account. You can spin up a cluster with a single command:

# install the CLI on your machine
$ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.13/get-cli.sh)"

# provision infrastructure on AWS and spin up a cluster
$ cortex cluster up

aws region: us-west-2
aws instance type: g4dn.xlarge
spot instances: yes
min instances: 0
max instances: 5

aws resource                                cost per hour
1 eks cluster                               $0.10
0 - 5 g4dn.xlarge instances for your apis   $0.1578 - $0.526 each (varies based on spot price)
0 - 5 20gb ebs volumes for your apis        $0.003 each
1 t3.medium instance for the operator       $0.0416
1 20gb ebs volume for the operator          $0.003
2 elastic load balancers                    $0.025 each

your cluster will cost $0.19 - $2.84 per hour based on the cluster size and spot instance availability

￮ spinning up your cluster ...

your cluster is ready!

Deploying a model

Implement your predictor

# predictor.py

class PythonPredictor:
    def __init__(self, config):
        self.model = download_model()

    def predict(self, payload):
        return model.predict(payload["text"])

Configure your deployment

# cortex.yaml

- name: sentiment-classifier
  predictor:
    type: python
    path: predictor.py
  tracker:
    model_type: classification
  compute:
    gpu: 1
    mem: 4G

Deploy to AWS

$ cortex deploy

creating sentiment-classifier

Serve real-time predictions

$ curl http://***.amazonaws.com/sentiment-classifier \
    -X POST -H "Content-Type: application/json" \
    -d '{"text": "the movie was amazing!"}'

positive

Monitor your deployment

$ cortex get sentiment-classifier --watch

status   up-to-date   requested   last update   avg inference   2XX
live     1            1           8s            24ms            12

class     count
positive  8
negative  4

What is Cortex similar to?

Cortex is an open source alternative to serving models with SageMaker or building your own model deployment platform on top of AWS services like Elastic Kubernetes Service (EKS), Elastic Container Service (ECS), Lambda, Fargate, and Elastic Compute Cloud (EC2) and open source projects like Docker, Kubernetes, and TensorFlow Serving.

How does Cortex work?

The CLI sends configuration and code to the cluster every time you run cortex deploy. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch.

Examples of Cortex deployments

Sentiment analysis: deploy a BERT model for sentiment analysis.
Image classification: deploy an Inception model to classify images.
Search completion: deploy Facebook's RoBERTa model to complete search terms.
Text generation: deploy Hugging Face's DistilGPT2 model to generate text.
Iris classification: deploy a scikit-learn model to classify iris flowers.

Cortex

Github stars Tracking Chart