aduana

Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).

所有者: Scrapinghub 该所有者的项目 (0)
平台:
许可证: BSD 3-Clause "New" or "Revised" License
分类:

Go
主题:

data-science
喜欢:

0

比较:

Github星跟踪图

Description

A library to guide a web crawl using
PageRank,
HITS or other ranking
algorithms based on the link structure of the web graph, even when
making big crawls (one billion pages).

Warning: I only test with regularity under Linux, my development
platform. From time to time I test also on OS X and Windows 8 using
MinGW64.

Installation

pip install aduana

Documentation

Available at readthedocs

I have started documenting plans/ideas at the
wiki.

Example

Single spider example:

cd example
pip install -r requirements.txt
scrapy crawl example

To run the distributed crawler see the
docs

主要指标

概览

名称与所有者	scrapinghub/aduana
主编程语言	C
编程语言	CMake (语言数: 3)
平台
许可证	BSD 3-Clause "New" or "Revised" License

所有者活动

创建于	2015-05-12 06:47:26
推送于	2024-05-21 16:44:12
最后一次提交	2015-11-16 17:39:42
发布数	0

用户参与

星数	55
关注者数	111
派生数	9
提交数	262
已启用问题?
问题数	19
打开的问题数	9
拉请求数	5
打开的拉请求数	2
关闭的拉请求数	2

项目设置

已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?