
Restrict crawl and scraping scope using matchers.

  • Owner: scrapinghub/scrapy-mosquitera
  • Platform:
  • License:: BSD 3-Clause "New" or "Revised" License
  • Category::
  • Topic:
  • Like:

Github stars Tracking Chart

scrapy-mosquitera - tools for filtered scraping

.. image::

.. image::

.. image::

.. image::

.. epigraph::

How can I scrape items off a site from the last five days?

-- Scrapy User

That question started the development of scrapy-mosquitera, a tool to help
you restrict crawling and scraping scope using matchers.

Matchers are simple Python functions that return the validity of an element
under certain restrictions.

The first goal in the project was date matching, but you can create your own
matcher for your own crawling and scraping needs.

How it works

In the case where the dates are available in the URLs, you will just use
the matcher function directly in your code::

from scrapy_mosquitera.matchers import date_matches

date = scrape_date_from_url(url)

if date_matches(data=date, after='5 days ago'):
yield Request(url=url, callback=self.parse_item)

To handle the case when the date is only available at the time when you scrape
the items, scrapy-mosquitera provides a PaginationMixin to control the
crawl according to the dates scraped.

Head on to the remaining of the documentation_ for more details.

.. _documentation:


The quick way::

pip install scrapy-mosquitera


Name With Ownerscrapinghub/scrapy-mosquitera
Primary LanguagePython
Program languageMakefile (Language Count: 3)
License:BSD 3-Clause "New" or "Revised" License
Release Count2
Last Release Namev0.1.1 (Posted on )
First Release Name0.1.0 (Posted on )
Created At2016-05-10 12:27:29
Pushed At2016-06-08 20:59:24
Last Commit At2016-06-08 16:59:12
Stargazers Count25
Watchers Count6
Fork Count6
Commits Count34
Has Issues Enabled
Issues Count0
Issue Open Count0
Pull Requests Count0
Pull Requests Open Count0
Pull Requests Close Count0
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private
To the top