exporters

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations

  • Owner: scrapinghub/exporters
  • Platform:
  • License:: BSD 3-Clause "New" or "Revised" License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

.. _Github repository: https://github.com/scrapinghub/exporters/

Exporters project documentation


Exporters provide a flexible way to export data from multiple sources to
multiple destinations, allowing filtering and transforming the data.

This `Github repository`_ is used as a central repository.

Full documentation can be found here http://exporters.readthedocs.io/en/latest/


Getting Started
===============

Install exporters
-----------------

First of all, we recommend to create a virtualenv::

    virtualenv exporters
    source exporters/bin/activate

..

Installing::

    pip install exporters

..



Creating a configuration
------------------------

Then, we can create our first configuration object and store it in a file called config.json.
 This configuration will read from an s3 bucket and store it in our filesystem, exporting only
 the records which have United States in field country:

.. code-block:: javascript

   {
        "reader": {
            "name": "exporters.readers.s3_reader.S3Reader",
            "options": {
                "bucket": "YOUR_BUCKET",
                "aws_access_key_id": "YOUR_ACCESS_KEY",
                "aws_secret_access_key": "YOUR_SECRET_KEY",
                "prefix": "exporters-tutorial/sample-dataset"
            }
        },
        "filter": {
            "name": "exporters.filters.key_value_regex_filter.KeyValueRegexFilter",
            "options": {
                "keys": [
                    {"name": "country", "value": "United States"}
                ]
            }
        },
        "writer":{
            "name": "exporters.writers.fs_writer.FSWriter",
            "options": {
                "filebase": "/tmp/output/"
            }
        }
   }


Export with script
------------------

We can use the provided script to run this export:

.. code-block:: shell

    python bin/export.py --config config.json


Use it as a library
-------------------

The export can be run using exporters as a library:

.. code-block:: python

    from exporters import BasicExporter

    exporter = BasicExporter.from_file_configuration('config.json')
    exporter.export()


Resuming an export job
----------------------

Let's suppose we have a pickle file with a previously failed export job. If we want to resume it
we must run the export script:

.. code-block:: shell

    python bin/export.py --resume pickle://pickle-file.pickle

Main metrics

Overview
Name With Ownerscrapinghub/exporters
Primary LanguagePython
Program languageMakefile (Language Count: 2)
Platform
License:BSD 3-Clause "New" or "Revised" License
所有者活动
Created At2015-09-08 21:59:12
Pushed At2024-05-21 08:44:05
Last Commit At2019-06-04 09:21:21
Release Count70
Last Release Name0.6.18 (Posted on 2017-05-25 17:52:52)
First Release Name0.1 (Posted on )
用户参与
Stargazers Count40
Watchers Count97
Fork Count10
Commits Count1.5k
Has Issues Enabled
Issues Count20
Issue Open Count5
Pull Requests Count312
Pull Requests Open Count7
Pull Requests Close Count25
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private