pspider

纯 PHP 开发的并行抓取工具 (Parallel web crawler written in PHP)

  • Owner: hightman/pspider
  • Platform:
  • License::
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

PHP - spider 框架

这是最近使用纯 php 代码开发的并行抓取(爬虫)框架,基于 hightman\httpclient 组件。

您必须先装有 composer,然后在项目里先运行以下命令下载组件:

composer install

使用 pspider

这里头的 URL 表管理需要 MySQLi 扩展支持,表结构和自定义的内容参见自定义文件。

  1. 复制 custom/skel.inc.phpcustom/your.inc.php
  2. 根据说明修改 custom/your.inc.php
  3. 根据 custom/your.inc.php 里的注释创建 mysql 的 URL 表
  4. 运行 spider.php -u http://... 即可开始循环抓取
  5. UrlTable 的实现很简单仅作示例,具体可自行重做

Main metrics

Overview
Name With Ownerhightman/pspider
Primary LanguagePHP
Program languagePHP (Language Count: 1)
Platform
License:
所有者活动
Created At2013-03-08 08:47:47
Pushed At2015-09-16 09:21:38
Last Commit At2015-09-15 18:13:19
Release Count0
用户参与
Stargazers Count265
Watchers Count40
Fork Count110
Commits Count32
Has Issues Enabled
Issues Count5
Issue Open Count1
Pull Requests Count1
Pull Requests Open Count0
Pull Requests Close Count0
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private