ArchiveBot

ArchiveBot,一个用于网站存档的 IRC 机器人。「ArchiveBot, an IRC bot for archiving websites」

  1. ArchiveBot

    Coders, I have a question.
    Or, a request, etc.
    I spent some time with xmc discussing something we could
    do to make things easier around here.
    What we came up with is a trigger for a bot, which can
    be triggered by people with ops.
    You tell it a website. It crawls it. WARC. Uploads it to
    archive.org. Boom.
    I can supply machine as needed.
    Obviously there's some sanitation issues, and it is root
    all the way down or nothing.
    I think that would help a lot for smaller sites
    Sites where it's 100 pages or 1000 pages even, pretty
    simple.
    And just being able to go "bot, get a sanity dump"

  2. More info

ArchiveBot has two major backend components: the control node, which
runs the IRC interface and bookkeeping programs, and the crawlers, which
do all the Web crawling. ArchiveBot users communicate with ArchiveBot
by issuing commands in an IRC channel.

User's guide: http://archivebot.readthedocs.org/en/latest/
Control node installation guide: INSTALL.backend
Crawler installation guide: INSTALL.pipeline

  1. Local use

ArchiveBot was originally written as a set of separate programs for
deployment on a server. This means it has a poor distribution story.
However, Ivan Kozik (@ivan) has taken the ArchiveBot pipeline,
dashboard, ignores, and control system and created a package intended for
personal use. You can find it at https://github.com/ArchiveTeam/grab-site.

  1. License

Copyright 2013 David Yip; made available under the MIT license. See
LICENSE for details.

  1. Acknowledgments

Thanks to Alard (@alard), who added WARC generation and Lua scripting to
GNU Wget. Wget+lua was the first web crawler used by ArchiveBot.

Thanks to Christopher Foo (@chfoo) for wpull, ArchiveBot's current web
crawler.

Thanks to Ivan Kozik (@ivan) for maintaining ignore patterns and
tracking down performance problems at scale.

Other thanks go to the following projects:

  1. Special thanks

Dragonette, Barnaby Bright, Vienna Teng, NONONO.

The memory hole of the Web has gone too far.
Don't look down, never look away; ArchiveBot's like the wind.

vim:ts=2:sw=2:tw=72:et

主要指標

概覽
名稱與所有者CommunityToolkit/WindowsCommunityToolkit
主編程語言C#
編程語言Ruby (語言數: 3)
平台Linux
許可證Other
所有者活动
創建於2016-06-18 05:29:46
推送於2024-10-31 23:52:52
最后一次提交
發布數42
最新版本名稱v7.1.3 (發布於 )
第一版名稱v0.9.5 (發布於 )
用户参与
星數6k
關注者數328
派生數1.4k
提交數13.5k
已啟用問題?
問題數2479
打開的問題數278
拉請求數1642
打開的拉請求數36
關閉的拉請求數311
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?