Polars

Rust 和 Python 中的快速多线程 DataFrame 库。「Fast multi-threaded DataFrame library in Rust and Python」

Github星跟蹤圖

Polars

rust docs
Build and test

PyPI Latest Release
NPM Latest Release

Blazingly fast DataFrames in Rust, Python & Node.js

Polars is a blazingly fast DataFrames library implemented in Rust using
Apache Arrow Columnar Format as memory model.

  • Lazy | eager execution
  • Multi-threaded
  • SIMD
  • Query optimization
  • Powerful expression API
  • Rust | Python | ...

To learn more, read the User Guide.

>>> import polars as pl
>>> df = pl.DataFrame(
...     {
...         "A": [1, 2, 3, 4, 5],
...         "fruits": ["banana", "banana", "apple", "apple", "banana"],
...         "B": [5, 4, 3, 2, 1],
...         "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
...     }
... )

# embarrassingly parallel execution
# very expressive query language
>>> (
...     df
...     .sort("fruits")
...     .select(
...         [
...             "fruits",
...             "cars",
...             pl.lit("fruits").alias("literal_string_fruits"),
...             pl.col("B").filter(pl.col("cars") == "beetle").sum(),
...             pl.col("A").filter(pl.col("B") > 2).sum().over("cars").alias("sum_A_by_cars"),     # groups by "cars"
...             pl.col("A").sum().over("fruits").alias("sum_A_by_fruits"),                         # groups by "fruits"
...             pl.col("A").reverse().over("fruits").alias("rev_A_by_fruits"),                     # groups by "fruits
...             pl.col("A").sort_by("B").over("fruits").alias("sort_A_by_B_by_fruits"),            # groups by "fruits"
...         ]
...     )
... )
shape: (5, 8)
┌──────────┬──────────┬──────────────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐
│ fruits   ┆ cars     ┆ literal_stri ┆ B   ┆ sum_A_by_ca ┆ sum_A_by_fr ┆ rev_A_by_fr ┆ sort_A_by_B │
│ ---      ┆ ---      ┆ ng_fruits    ┆ --- ┆ rs          ┆ uits        ┆ uits        ┆ _by_fruits  │
│ str      ┆ str      ┆ ---          ┆ i64 ┆ ---         ┆ ---         ┆ ---         ┆ ---         │
│          ┆          ┆ str          ┆     ┆ i64         ┆ i64         ┆ i64         ┆ i64         │
╞══════════╪══════════╪══════════════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡
│ "apple"  ┆ "beetle" ┆ "fruits"     ┆ 11  ┆ 4           ┆ 7           ┆ 4           ┆ 4           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "apple"  ┆ "beetle" ┆ "fruits"     ┆ 11  ┆ 4           ┆ 7           ┆ 3           ┆ 3           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana" ┆ "beetle" ┆ "fruits"     ┆ 11  ┆ 4           ┆ 8           ┆ 5           ┆ 5           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana" ┆ "audi"   ┆ "fruits"     ┆ 11  ┆ 2           ┆ 8           ┆ 2           ┆ 2           │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "banana" ┆ "beetle" ┆ "fruits"     ┆ 11  ┆ 4           ┆ 8           ┆ 1           ┆ 1           │
└──────────┴──────────┴──────────────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘

Performance 🚀🚀

Polars is very fast, and in fact is one of the best performing solutions available.
See the results in h2oai's db-benchmark.

Python setup

Install the latest polars version with:

$ pip3 install -U polars

Releases happen quite often (weekly / every few days) at the moment, so updating polars regularly to get the latest bugfixes / features might not be a bad idea.

Rust setup

You can take latest release from crates.io, or if you want to use the latest features / performance improvements
point to the master branch of this repo.

polars = { git = "https://github.com/pola-rs/polars", rev = "<optional git tag>" }

Rust version

Required Rust version >=1.58

Documentation

Want to know about all the features Polars supports? Read the docs!

Python

Rust

Node

Contribution

Want to contribute? Read our contribution guideline.

[Python]: compile polars from source

If you want a bleeding edge release or maximal performance you should compile polars from source.

This can be done by going through the following steps in sequence:

  1. Install the latest Rust compiler
  2. Install maturin: $ pip3 install maturin
  3. Choose any of:
    • Fastest binary, very long compile times:
      $ cd py-polars && maturin develop --rustc-extra-args="-C target-cpu=native" --release
      
    • Fast binary, Shorter compile times:
      $ cd py-polars && maturin develop --rustc-extra-args="-C codegen-units=16 -C lto=thin -C target-cpu=native" --release
      

Note that the Rust crate implementing the Python bindings is called py-polars to distinguish from the wrapped
Rust crate polars itself. However, both the Python package and the Python module are named polars, so you
can pip install polars and import polars.

Arrow2

Polars has transitioned to arrow2.
Arrow2 is a faster and safer implementation of the Apache Arrow Columnar Format.
Arrow2 also has a more granular code base, helping to reduce the compiler bloat.

Use custom Rust function in python?

See this example.

Going big...

Do you expect more than 2^32 ~4,2 billion rows? Compile polars with the bigidx feature flag.

Or for python users install $ pip install -U polars-u64-idx.

Don't use this unless you hit the row boundary as the default polars is faster and consumes less memory.

Acknowledgements

Development of Polars is proudly powered by

Xomnia

Sponsors

概覽

名稱與所有者pola-rs/polars
主編程語言Rust
編程語言Rust (語言數: 4)
平台Linux, Mac, Windows
許可證Other
發布數462
最新版本名稱py-0.20.23 (發布於 )
第一版名稱rs-0.4.0 (發布於 )
創建於2020-05-13 19:45:33
推送於2024-05-04 22:12:11
最后一次提交
星數26.4k
關注者數148
派生數1.6k
提交數9.6k
已啟用問題?
問題數7295
打開的問題數1521
拉請求數7921
打開的拉請求數93
關閉的拉請求數668
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?
去到頂部