pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

Github星跟踪图

Webrecorder pywb 2.7

.. image:: https://raw.githubusercontent.com/webrecorder/pywb/main/pywb/static/pywb-logo.png

.. image:: https://github.com/webrecorder/pywb/workflows/CI/badge.svg
:target: https://github.com/webrecorder/pywb/actions
.. image:: https://codecov.io/gh/webrecorder/pywb/branch/main/graph/badge.svg
:target: https://codecov.io/gh/webrecorder/pywb

Web Archiving Tools for All

View the full pywb documentation <https://pywb.readthedocs.org>_

pywb is a Python (2 and 3) web archiving toolkit for replaying web archives large and small as accurately as possible.
The toolkit now also includes new features for creating high-fidelity web archives.

This toolset forms the foundation of Webrecorder project, but also provides a generic web archiving toolkit
that is used by other web archives, including the traditional "Wayback Machine" functionality.

New Features
^^^^^^^^^^^^

The 2.x release included a major overhaul of pywb and introduces many new features, including the following:

  • Dynamic multi-collection configuration system with no-restart updates.

  • New recording capability to create new web archives from the live web or other archives.

  • Componentized architecture with standalone Warcserver, Recorder and Rewriter components.

  • Support for Memento API aggregation and fallback chains for querying multiple remote and local archival sources.

  • HTTP/S Proxy Mode with customizable certificate authority for proxy mode recording and replay.

  • Flexible rewriting system with pluggable rewriters for different content-types.

  • Standalone, modular client-side rewriting system (wombat.js) <https://github.com/webrecorder/wombat>_ to handle most modern web sites.

  • Improved 'calendar' query UI with incremental loading, grouping results by year and month, and updated replay banner.

  • Extensible UI customizations system for modifying all aspects of the UI.

  • Robust access control system for blocking or excluding URLs, by prefix or by exact match.

  • New in 2.6: Access Control embargo and http-header control access settings.

  • New in 2.6: Support for localization and multi-language deployment.

  • New in 2.7: New banner/calendar UI written in Vue <https://vuejs.org/>_, with interactive timeline and easier theming of colors and logo via config.yaml.

Please see the full documentation <https://pywb.readthedocs.org>_ for more detailed info on all these features.

Installation for Deployment

To install pywb for usage, you can use:

pip install pywb

Note: depending on your Python installation, you may have to use pip3 instead of pip.

Installation from local copy

git clone https://github.com/webrecorder/pywb

To install from a locally cloned copy, install with pip install -e . or python setup.py install.

To run tests, we recommend installing pip install tox tox-current-env and then running tox --current-env to test in your current Python environment.

To Build docs locally, run: cd docs; make html. (The docs will be built in ./_build/html/index.html)

Running

After installation, you can run pywb or wayback.

Consult the local or online docs <https://pywb.readthedocs.org>_ for latest usage and configuration details.

Documentation

The pywb documentation is extensive. Some links to a few key guides:

  • Getting Started Guide <https://pywb.readthedocs.io/en/latest/manual/usage.html#getting-started>_

  • Embargo and Access Control Guide <https://pywb.readthedocs.io/en/latest/manual/access-control.html>_

  • Localization and Multi-Language Guide <https://pywb.readthedocs.io/en/latest/manual/localization.html>_

  • Deployment Guide <https://pywb.readthedocs.io/en/latest/manual/usage.html#deployment>_

  • OpenWayback Transition Guide <https://pywb.readthedocs.io/en/latest/manual/owb-transition.html>_

Contributions & Bug Reports

Users are encouraged to fork and contribute to this project to keep improving web archiving tools. Please consult the contributing guide <CONTRIBUTING.md>_ for information on how to contribute to pywb.

主要指标

概览
名称与所有者webrecorder/pywb
主编程语言JavaScript
编程语言Python (语言数: 8)
平台
许可证GNU General Public License v3.0
所有者活动
创建于2013-12-09 03:30:31
推送于2025-05-02 01:40:39
最后一次提交
发布数65
最新版本名称v-2.9.0-beta.0 (发布于 )
第一版名称0.2.2 (发布于 )
用户参与
星数1.5k
关注者数59
派生数228
提交数2.3k
已启用问题?
问题数493
打开的问题数160
拉请求数382
打开的拉请求数14
关闭的拉请求数48
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?