pyspider

A Powerful Spider(Web Crawler) System in Python.

  • 所有者: binux/pyspider
  • 平台:
  • 許可證: Apache License 2.0
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

pyspider Build Status Coverage Status Try

A Powerful Spider(Web Crawler) System in Python. TRY IT NOW!

Tutorial: http://docs.pyspider.org/en/latest/tutorial/
Documentation: http://docs.pyspider.org/
Release notes: https://github.com/binux/pyspider/releases

Sample Code

from pyspider.libs.base_handler import *


class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://scrapy.org/', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),
        }

Demo

Installation

WARNING: WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or enable need-auth for webui.

Quickstart: http://docs.pyspider.org/en/latest/Quickstart/

Contribute

TODO

v0.4.0

  • a visual scraping interface like portia

License

Licensed under the Apache License, Version 2.0

主要指標

概覽
名稱與所有者binux/pyspider
主編程語言Python
編程語言Python (語言數: 6)
平台
許可證Apache License 2.0
所有者活动
創建於2014-02-21 19:18:47
推送於2024-04-30 19:43:29
最后一次提交2020-08-02 10:34:20
發布數13
最新版本名稱v0.3.10 (發布於 )
第一版名稱v0.1.0 (發布於 )
用户参与
星數16.7k
關注者數891
派生數3.7k
提交數1.2k
已啟用問題?
問題數823
打開的問題數274
拉請求數90
打開的拉請求數27
關閉的拉請求數55
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?