Colly

优雅的 Golang Scraper 和爬虫框架。「Elegant Scraper and Crawler Framework for Golang

Github星跟踪图

Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

GoDoc
Backers on Open Collective Sponsors on Open Collective build status
report card
view examples
Code Coverage
FOSSA Status
Twitter URL

Features

  • Clean API
  • Fast (>1k request/sec on a single core)
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync/async/parallel scraping
  • Caching
  • Automatic encoding of non-unicode responses
  • Robots.txt support
  • Distributed scraping
  • Configuration via environment variables
  • Extensions

Example

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

See examples folder for more detailed examples.

Installation

go get -u github.com/gocolly/colly/v2/...

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

Other Projects Using Colly

Below is a list of public, open source projects that use Colly:

If you are using Colly in a project please send a pull request to add it to the list.

Contributors

This project exists thanks to all the people who contribute. (CONTRIBUTING.md).

Backers

Thank you to all our backers! ? [Become a backer]

Sponsors

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]










License

FOSSA Status

主要指标

概览
名称与所有者gocolly/colly
主编程语言Go
编程语言Go (语言数: 2)
平台Linux, Mac, Windows
许可证Apache License 2.0
所有者活动
创建于2017-09-29 14:08:49
推送于2025-06-17 07:44:45
最后一次提交2025-06-17 09:44:45
发布数7
最新版本名称v2.2.0 (发布于 2025-03-27 11:42:17)
第一版名称v1.0.0 (发布于 2018-05-13 00:44:45)
用户参与
星数24.3k
关注者数325
派生数1.8k
提交数724
已启用问题?
问题数560
打开的问题数148
拉请求数182
打开的拉请求数44
关闭的拉请求数67
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?