exporters

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations

  • 所有者: scrapinghub/exporters
  • 平台:
  • 许可证: BSD 3-Clause "New" or "Revised" License
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

.. _Github repository: https://github.com/scrapinghub/exporters/

Exporters project documentation


Exporters provide a flexible way to export data from multiple sources to
multiple destinations, allowing filtering and transforming the data.

This `Github repository`_ is used as a central repository.

Full documentation can be found here http://exporters.readthedocs.io/en/latest/


Getting Started
===============

Install exporters
-----------------

First of all, we recommend to create a virtualenv::

    virtualenv exporters
    source exporters/bin/activate

..

Installing::

    pip install exporters

..



Creating a configuration
------------------------

Then, we can create our first configuration object and store it in a file called config.json.
 This configuration will read from an s3 bucket and store it in our filesystem, exporting only
 the records which have United States in field country:

.. code-block:: javascript

   {
        "reader": {
            "name": "exporters.readers.s3_reader.S3Reader",
            "options": {
                "bucket": "YOUR_BUCKET",
                "aws_access_key_id": "YOUR_ACCESS_KEY",
                "aws_secret_access_key": "YOUR_SECRET_KEY",
                "prefix": "exporters-tutorial/sample-dataset"
            }
        },
        "filter": {
            "name": "exporters.filters.key_value_regex_filter.KeyValueRegexFilter",
            "options": {
                "keys": [
                    {"name": "country", "value": "United States"}
                ]
            }
        },
        "writer":{
            "name": "exporters.writers.fs_writer.FSWriter",
            "options": {
                "filebase": "/tmp/output/"
            }
        }
   }


Export with script
------------------

We can use the provided script to run this export:

.. code-block:: shell

    python bin/export.py --config config.json


Use it as a library
-------------------

The export can be run using exporters as a library:

.. code-block:: python

    from exporters import BasicExporter

    exporter = BasicExporter.from_file_configuration('config.json')
    exporter.export()


Resuming an export job
----------------------

Let's suppose we have a pickle file with a previously failed export job. If we want to resume it
we must run the export script:

.. code-block:: shell

    python bin/export.py --resume pickle://pickle-file.pickle

主要指标

概览
名称与所有者scrapinghub/exporters
主编程语言Python
编程语言Makefile (语言数: 2)
平台
许可证BSD 3-Clause "New" or "Revised" License
所有者活动
创建于2015-09-08 21:59:12
推送于2024-05-21 08:44:05
最后一次提交2019-06-04 09:21:21
发布数70
最新版本名称0.6.18 (发布于 2017-05-25 17:52:52)
第一版名称0.1 (发布于 )
用户参与
星数40
关注者数97
派生数10
提交数1.5k
已启用问题?
问题数20
打开的问题数5
拉请求数312
打开的拉请求数7
关闭的拉请求数25
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?