crawl-anywhere

Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.

  • 所有者: bejean/crawl-anywhere
  • 平台:
  • 許可證: Apache License 2.0
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

Crawl-Anywhere

April 2013 - Starting version 4.0, Crawl-Anywhere becomes an open-source project. Current version is 4.0.0

Stable version 3.0.x is still available at http://www.crawl-anywhere.com/

Introduction

Crawl Anywhere is mainly a web crawler. However, Crawl-Anywhere includes all components in order to build a vertical search engine.

Crawl Anywhere includes :

Project home page : http://www.crawl-anywhere.com/

A web crawler is a program that discovers and read all HTML pages or documents (HTML, PDF, Office, ...) on a web site in order for example to index these data and build a search engine (like google). Wikipedia provides a great description of what is a Web crawler : http://en.wikipedia.org/wiki/Web_crawler.

Support

Build distribution

Pre-requisites :

  • Maven 3.0.0 or >
  • Oracle Java 7 or >

Steps :

Installation

Pre-requisites :

  • Oracle Java 7 or >
  • Apache 2.0 or >
  • PHP 5.2.x or 5.3.x or 5.4.x
  • MongoDB 64 bits 2.2 or >
  • Solr 4.3.0 or > (configuration files provided for Solr 4.3.0 and 4.10.0)

Steps :

Getting Started

See the User Manual at http://www.crawl-anywhere.com/getting-started/

History

  • release 4.0.0-alpha-1 : April, 28 2013
  • release 4.0.0-alpha-2 : May, 22 2013
  • release 4.0.0-alpha-3 : June, 21 2013
  • release 4.0.0-alpha-4 : June, 23 2013
  • release 4.0.0-beta-1 : August, 6 2013
  • release 4.0.0-release-candidate : October, 20 2013
  • release 4.0.0 final : December, 1, 2014

主要指標

概覽
名稱與所有者bejean/crawl-anywhere
主編程語言PHP
編程語言Shell (語言數: 6)
平台
許可證Apache License 2.0
所有者活动
創建於2013-01-28 10:21:11
推送於2017-07-01 17:59:18
最后一次提交2015-01-28 16:18:56
發布數7
最新版本名稱4.0.0 (發布於 2014-11-30 22:59:54)
第一版名稱4.0.0-alpha-1 (發布於 2013-04-27 19:11:09)
用户参与
星數95
關注者數23
派生數37
提交數218
已啟用問題?
問題數90
打開的問題數36
拉請求數0
打開的拉請求數2
關閉的拉請求數1
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?