cuml

cuML - RAPIDS Machine Learning Library

Github星跟蹤圖

 cuML - GPU Machine Learning Algorithms

Build Status

cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects.

cuML enables data scientists, researchers, and software engineers to run
traditional tabular ML tasks on GPUs without going into the details of CUDA
programming. In most cases, cuML's Python API matches the API from
scikit-learn.

For large datasets, these GPU-based implementations can complete 10-50x faster
than their CPU equivalents. For details on performance, see the cuML Benchmarks
Notebook
.

As an example, the following Python snippet loads input and computes DBSCAN clusters, all on GPU:

import cudf
from cuml.cluster import DBSCAN

# Create and populate a GPU DataFrame
gdf_float = cudf.DataFrame()
gdf_float['0'] = [1.0, 2.0, 5.0]
gdf_float['1'] = [4.0, 2.0, 1.0]
gdf_float['2'] = [4.0, 2.0, 1.0]

# Setup and fit clusters
dbscan_float = DBSCAN(eps=1.0, min_samples=1)
dbscan_float.fit(gdf_float)

print(dbscan_float.labels_)

Output:

0    0
1    1
2    2
dtype: int32

cuML also features multi-GPU and multi-node-multi-GPU operation, using Dask, for a
growing list of algorithms. The following Python snippet reads input from a CSV file and performs
a NearestNeighbors query across a cluster of Dask workers, using multiple GPUs on a single node:

# Create a Dask CUDA cluster w/ one worker per device
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()

# Read CSV file in parallel across workers
import dask_cudf
df = dask_cudf.read_csv("/path/to/csv")

# Fit a NearestNeighbors model and query it
from cuml.dask.neighbors import NearestNeighbors
nn = NearestNeighbors(n_neighbors = 10)
nn.fit(df)
neighbors = nn.kneighbors(df)

For additional examples, browse our complete API
documentation
, or check out our
introductory walkthrough
notebooks
. Finally, you
can find complete end-to-end examples in the notebooks-contrib
repo
.

Supported Algorithms, Category, Algorithm, Notes, ---, ---, ---, Clustering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), K-Means, Multi-node multi-GPU via Dask, Dimensionality Reduction, Principal Components Analysis (PCA), Multi-node multi-GPU via Dask, Truncated Singular Value Decomposition (tSVD), Multi-node multi-GPU via Dask, Uniform Manifold Approximation and Projection (UMAP), Random Projection, t-Distributed Stochastic Neighbor Embedding (TSNE), Linear Models for Regression or Classification, Linear Regression (OLS), Linear Regression with Lasso or Ridge Regularization, ElasticNet Regression, Logistic Regression, Stochastic Gradient Descent (SGD), Coordinate Descent (CD), and Quasi-Newton (QN) (including L-BFGS and OWL-QN) solvers for linear models, Nonlinear Models for Regression or Classification, Random Forest (RF) Classification, Experimental multi-node multi-GPU via Dask, Random Forest (RF) Regression, Experimental multi-node multi-GPU via Dask, Inference for decision tree-based models, Forest Inference Library (FIL), K-Nearest Neighbors (KNN), Multi-node multi-GPU via Dask, uses Faiss for Nearest Neighbors Query., K-Nearest Neighbors (KNN) Classification, K-Nearest Neighbors (KNN) Regression, Support Vector Machine Classifier (SVC), Epsilon-Support Vector Regression (SVR), Time Series, Linear Kalman Filter, Holt-Winters Exponential Smoothing, Auto-regressive Integrated Moving Average (ARIMA), Supports seasonality (SARIMA), ---

Installation

See the RAPIDS Release
Selector
for the command
line to install either nightly or official release cuML packages via Conda or
Docker.

Build/Install from Source

See the build guide.

Contributing

Please see our guide for contributing to cuML.

Contact

Find out more details on the RAPIDS site

Open GPU Data Science

The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

主要指標

概覽
名稱與所有者rapidsai/cuml
主編程語言C++
編程語言CMake (語言數: 10)
平台
許可證Apache License 2.0
所有者活动
創建於2018-10-11 15:45:35
推送於2025-05-08 12:51:23
最后一次提交
發布數99
最新版本名稱v25.08.00a (發布於 2025-04-30 15:13:00)
第一版名稱v0.2.0 (發布於 )
用户参与
星數4.7k
關注者數77
派生數569
提交數15.9k
已啟用問題?
問題數2631
打開的問題數912
拉請求數3329
打開的拉請求數74
關閉的拉請求數608
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?