pandas

适用于 Python 的灵活强大的数据分析/操作库,提供与 R data.frame 对象、统计函数等类似的标记数据结构。「Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.」

Github stars Tracking Chart

pandas:功能强大的Python数据分析工具包

它是什么?

pandas 是一个 Python 软件包,它提供了快速、灵活和富有表现力的数据结构,旨在使处理关系数据或标记数据既简单又直观。它的目标是成为用 Python 进行实际的、真实世界的数据分析的基本的高级构建块。此外,它的更大目标是成为任何语言中最强大、最灵活的开放源码数据分析操作工具。它已经在朝着这一目标前进。

主要特点

以下是 pandas 做得好的一些事情:

  • 在浮点和非浮点数据中轻松处理丢失的数据(表示为 NaN);
  • 大小可变性:可以从 DataFrame 和更高维的对象中插入和删除列;
  • 自动和显式的数据对齐:对象可以显式地对齐到一组标签上,或者用户可以简单地忽略标签,并让 Series、DataFrame 等为您自动在计算中对齐数据;
  • 功能强大,灵活的组功能,用于对数据集进行拆分应用合并操作,用于聚合和转换数据;
  • 在其他 Python 和 NumPy 数据结构中,将粗糙的、不同索引的数据转换为 DataFrame 对象是很容易的;
  • 基于标签的智能切片,复杂的索引和大型数据集的子集;
  • 直观的合并和连接数据集;
  • 数据集的灵活整形和旋转;
  • 轴的分层标签(每个刻度可能有多个标签);
  • 强大的IO工具,用于从flat 平面文件(CSV 和分隔符)、Excel 文件、数据库加载数据,以及从超快 HDF5 格式保存/加载数据;
  • 时间序列 -- 具体功能:日期范围生成和频率转换,移动窗口统计,移动窗口线性回归,日期偏移和滞后等。

从哪里获得

该源代码当前托管在 GitHub 上,网址为:https://github.com/pandas-dev/pandas

最新发布版本的二进制安装程序可在 Python 软件包索引和 conda 中获得。

# conda
conda install pandas
# or PyPI
pip install pandas

依赖

请参阅完整的安装说明,了解所需、推荐和可选依赖项的最小支持版本。

从源安装

要从源代码安装熊猫,除了上述常规依赖项之外,您还需要 Cython。可以从pypi安装Cython:

pip install cython

在 pandas 目录(克隆 git repo 后在此文件中找到的目录)中,执行以下命令:

python setup.py install

或以开发模式安装:

python -m pip install -e . --no-build-isolation --no-use-pep517

如果您有 make,也可以使用 makedevelop 来运行相同的命令。

或者

python setup.py develop

请参阅从源安装的完整说明。

许可

BSD 3

文档

官方文档托管在 PyData.org 上:https://pandas.pydata.org/pandas-docs/stable

背景

2008年,AQR(一家定量对冲基金)开始了对 pandas 的研究,此后一直在积极发展。

获得帮助

对于使用问题,最好的去处是 StackOverflow。此外,一般问题和讨论也可以在 pydata 邮件列表中进行。

讨论与发展

在此仓库中,大多数开发讨论都在github上进行。此外,pandas-dev 邮件列表还可用于专门讨论或设计问题,而 Gitter 频道可用于快速开发相关问题。
大多数开发讨论都是在 github 上进行的。此外,pandas-dev邮件列表还可用于专门讨论或设计问题,Gitter通道可用于解决与开发相关的快速问题

(The second edition revised by vz on 2020.07.11)

Overview

Name With Ownerpandas-dev/pandas
Primary LanguagePython
Program languageMakefile (Language Count: 10)
PlatformLinux, Mac, Windows
License:BSD 3-Clause "New" or "Revised" License
Release Count176
Last Release Namev2.2.2 (Posted on 2024-04-10 13:43:07)
First Release Name0.3.0 (Posted on )
Created At2010-08-24 01:37:33
Pushed At2024-04-21 18:24:30
Last Commit At
Stargazers Count41.9k
Watchers Count1.1k
Fork Count17.3k
Commits Count34.8k
Has Issues Enabled
Issues Count26305
Issue Open Count3629
Pull Requests Count25148
Pull Requests Open Count139
Pull Requests Close Count6722
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

pandas: powerful Python data analysis toolkit

PyPI Latest Release
Conda Latest Release
Package Status
License
Travis Build Status
Azure Build Status
Coverage
Downloads
Gitter
Powered by NumFOCUS

What is it?

pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with "relational" or "labeled" data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, real world data analysis in Python. Additionally, it has
the broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language
. It is already well on
its way towards this goal.

Main Features

Here are just a few of the things that pandas does well:

  • Easy handling of missing data (represented as
    NaN) in floating point as well as non-floating point data
  • Size mutability: columns can be inserted and
    deleted
    from DataFrame and higher dimensional
    objects
  • Automatic and explicit data alignment: objects can
    be explicitly aligned to a set of labels, or the user can simply
    ignore the labels and let Series, DataFrame, etc. automatically
    align the data for you in computations
  • Powerful, flexible group by functionality to perform
    split-apply-combine operations on data sets, for both aggregating
    and transforming data
  • Make it easy to convert ragged,
    differently-indexed data in other Python and NumPy data structures
    into DataFrame objects
  • Intelligent label-based slicing, fancy
    indexing
    , and subsetting of
    large data sets
  • Intuitive merging and joining data
    sets
  • Flexible reshaping and pivoting of
    data sets
  • Hierarchical labeling of axes (possible to have multiple
    labels per tick)
  • Robust IO tools for loading data from flat files
    (CSV and delimited), Excel files, databases,
    and saving/loading data from the ultrafast HDF5 format
  • Time series-specific functionality: date range
    generation and frequency conversion, moving window statistics,
    date shifting and lagging.

Where to get it

The source code is currently hosted on GitHub at:
https://github.com/pandas-dev/pandas

Binary installers for the latest released version are available at the Python
package index
and on conda.

# conda
conda install pandas
# or PyPI
pip install pandas

Dependencies

See the full installation instructions for minimum supported versions of required, recommended and optional dependencies.

Installation from sources

To install pandas from source you need Cython in addition to the normal
dependencies above. Cython can be installed from pypi:

pip install cython

In the pandas directory (same one where you found this file after
cloning the git repo), execute:

python setup.py install

or for installing in development mode:

python -m pip install -e . --no-build-isolation --no-use-pep517

If you have make, you can also use make develop to run the same command.

or alternatively

python setup.py develop

See the full instructions for installing from source.

License

BSD 3

Documentation

The official documentation is hosted on PyData.org: https://pandas.pydata.org/pandas-docs/stable

Background

Work on pandas started at AQR (a quantitative hedge fund) in 2008 and
has been under active development since then.

Getting Help

For usage questions, the best place to go to is StackOverflow.
Further, general questions and discussions can also take place on the pydata mailing list.

Discussion and Development

Most development discussion is taking place on github in this repo. Further, the pandas-dev mailing list can also be used for specialized discussions or design issues, and a Gitter channel is available for quick development related questions.

Contributing to pandas Open Source Helpers

All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.

A detailed overview on how to contribute can be found in the contributing guide. There is also an overview on GitHub.

If you are simply looking to start working with the pandas codebase, navigate to the GitHub "issues" tab and start looking through interesting issues. There are a number of issues listed under Docs and good first issue where you could start out.

You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to subscribe to pandas on CodeTriage.

Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it!

Feel free to ask questions on the mailing list or on Gitter.

As contributors and maintainers to this project, you are expected to abide by pandas' code of conduct. More information can be found at: Contributor Code of Conduct

To the top