aduana

Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).

  • 所有者: scrapinghub/aduana
  • 平台:
  • 許可證: BSD 3-Clause "New" or "Revised" License
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

Description Build Status

A library to guide a web crawl using
PageRank,
HITS or other ranking
algorithms based on the link structure of the web graph, even when
making big crawls (one billion pages).

Warning: I only test with regularity under Linux, my development
platform. From time to time I test also on OS X and Windows 8 using
MinGW64.

Installation

pip install aduana

Documentation

Available at readthedocs

I have started documenting plans/ideas at the
wiki.

Example

Single spider example:

cd example
pip install -r requirements.txt
scrapy crawl example

To run the distributed crawler see the
docs

主要指標

概覽
名稱與所有者scrapinghub/aduana
主編程語言C
編程語言CMake (語言數: 3)
平台
許可證BSD 3-Clause "New" or "Revised" License
所有者活动
創建於2015-05-11 22:47:26
推送於2024-05-21 08:44:12
最后一次提交2015-11-16 10:39:42
發布數0
用户参与
星數55
關注者數114
派生數9
提交數262
已啟用問題?
問題數19
打開的問題數9
拉請求數5
打開的拉請求數2
關閉的拉請求數2
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?