aduana

Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).

  • 所有者: scrapinghub/aduana
  • 平台:
  • 许可证: BSD 3-Clause "New" or "Revised" License
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

Description Build Status

A library to guide a web crawl using
PageRank,
HITS or other ranking
algorithms based on the link structure of the web graph, even when
making big crawls (one billion pages).

Warning: I only test with regularity under Linux, my development
platform. From time to time I test also on OS X and Windows 8 using
MinGW64.

Installation

pip install aduana

Documentation

Available at readthedocs

I have started documenting plans/ideas at the
wiki.

Example

Single spider example:

cd example
pip install -r requirements.txt
scrapy crawl example

To run the distributed crawler see the
docs

主要指标

概览
名称与所有者scrapinghub/aduana
主编程语言C
编程语言CMake (语言数: 3)
平台
许可证BSD 3-Clause "New" or "Revised" License
所有者活动
创建于2015-05-11 22:47:26
推送于2024-05-21 08:44:12
最后一次提交2015-11-16 10:39:42
发布数0
用户参与
星数55
关注者数114
派生数9
提交数262
已启用问题?
问题数19
打开的问题数9
拉请求数5
打开的拉请求数2
关闭的拉请求数2
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?