impyla

Python client for HiveServer2 implementations (e.g., Impala, Hive) for
distributed query engines.

For higher-level Impala functionality, including a Pandas-like interface over
distributed data sets, see the Ibis project.

Features

HiveServer2 compliant; works with Impala and Hive, including nested data
Fully DB API 2.0 (PEP 249)-compliant Python client (similar to
sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.
Works with Kerberos, LDAP, SSL
SQLAlchemy connector
Converter to pandas DataFrame, allowing easy integration into the
Python data stack (including scikit-learn and
matplotlib); but see the Ibis project for a richer
experience

Dependencies

Required:

Python 2.6+ or 3.3+
six, bit_array
thrift

Optional:

thrift_sasl==0.2.1 for hive and/or Kerberos support:
pandas for conversion to DataFrame objects; but see the Ibis project instead
sqlalchemy for the SQLAlchemy engine
pytest for running tests; unittest2 for testing on Python 2.6

Installation

Install the latest release with pip:

pip install impyla

For the latest (dev) version, install directly from the repo:

pip install git+https://github.com/cloudera/impyla.git

or clone the repo:

git clone https://github.com/cloudera/impyla.git
cd impyla
python setup.py install

Running the tests

impyla uses the pytest toolchain, and depends on the following
environment variables:

export IMPYLA_TEST_HOST=your.impalad.com
export IMPYLA_TEST_PORT=21050
export IMPYLA_TEST_AUTH_MECH=NOSASL

To run the maximal set of tests, run

cd path/to/impyla
py.test --connect impala

Leave out the --connect option to skip tests for DB API compliance.

Usage

Impyla implements the Python DB API v2.0 (PEP 249) database interface
(refer to it for API details):

from impala.dbapi import connect
conn = connect(host='my.host.com', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description  # prints the result set's schema
results = cursor.fetchall()

The Cursor object also exposes the iterator interface, which is buffered
(controlled by cursor.arraysize):

cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:
    process(row)

Furthermore the Cursor object returns you information about the columns
returned in the query. This is useful to export your data as a csv file.

import csv

cursor.execute('SELECT * FROM mytable LIMIT 100')
columns = [datum[0] for datum in cursor.description]
targetfile = '/tmp/foo.csv'

with open(targetfile, 'w', newline='') as outcsv:
    writer = csv.writer(outcsv, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')
    writer.writerow(columns)
    for row in cursor:
        writer.writerow(row)

You can also get back a pandas DataFrame object

from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example

How do I contribute code?

You need to first sign and return an
ICLA
and
CCLA
before we can accept and redistribute your contribution. Once these are submitted you are
free to start contributing to impyla. Submit these to CLA@cloudera.com.

Find

We use Github issues to track bugs for this project. Find an issue that you would like to
work on (or file one if you have discovered a new issue!). If no-one is working on it,
assign it to yourself only if you intend to work on it shortly.

It's a good idea to discuss your intended approach on the issue. You are much more
likely to have your patch reviewed and committed if you've already got buy-in from the
impyla community before you start.

Fix

Now start coding! As you are writing your patch, please keep the following things in mind:

First, please include tests with your patch. If your patch adds a feature or fixes a bug
and does not include tests, it will generally not be accepted. If you are unsure how to
write tests for a particular component, please ask on the issue for guidance.

Second, please keep your patch narrowly targeted to the problem described by the issue.
It's better for everyone if we maintain discipline about the scope of each patch. In
general, if you find a bug while working on a specific feature, file a issue for the bug,
check if you can assign it to yourself and fix it independently of the feature. This helps
us to differentiate between bug fixes and features and allows us to build stable
maintenance releases.

Finally, please write a good, clear commit message, with a short, descriptive title and
a message that is exactly long enough to explain what the problem was, and how it was
fixed.

Please create a pull request on github with your patch.

名称与所有者	cloudera/impyla
主编程语言	Python
编程语言	Python (语言数: 3)
平台
许可证	Apache License 2.0

创建于	2014-04-14 23:52:07
推送于	2025-07-31 08:40:47
最后一次提交	2025-07-31 10:27:29
发布数	57
最新版本名称	v0.22.0 (发布于 2025-07-31 10:40:01)
第一版名称	v0.8.0 (发布于 2014-04-25 12:42:32)

星数	740
关注者数	49
派生数	250
提交数	482
已启用问题?
问题数	345
打开的问题数	162
拉请求数	205
打开的拉请求数	7
关闭的拉请求数	35

已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?

impyla

Github星跟踪图

impyla

Features

Dependencies

Installation

Running the tests

Usage

How do I contribute code?

Find

Fix

主要指标