aduana

Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).

  • Owner: scrapinghub/aduana
  • Platform:
  • License:: BSD 3-Clause "New" or "Revised" License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

Description Build Status

A library to guide a web crawl using
PageRank,
HITS or other ranking
algorithms based on the link structure of the web graph, even when
making big crawls (one billion pages).

Warning: I only test with regularity under Linux, my development
platform. From time to time I test also on OS X and Windows 8 using
MinGW64.

Installation

pip install aduana

Documentation

Available at readthedocs

I have started documenting plans/ideas at the
wiki.

Example

Single spider example:

cd example
pip install -r requirements.txt
scrapy crawl example

To run the distributed crawler see the
docs

Main metrics

Overview
Name With Ownerscrapinghub/aduana
Primary LanguageC
Program languageCMake (Language Count: 3)
Platform
License:BSD 3-Clause "New" or "Revised" License
所有者活动
Created At2015-05-11 22:47:26
Pushed At2024-05-21 08:44:12
Last Commit At2015-11-16 10:39:42
Release Count0
用户参与
Stargazers Count55
Watchers Count114
Fork Count9
Commits Count262
Has Issues Enabled
Issues Count19
Issue Open Count9
Pull Requests Count5
Pull Requests Open Count2
Pull Requests Close Count2
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private