!! Important: This repository is deprecated. Please see HTTPArchive/httparchive.org for the latest development !!
The HTTP Archive tracks how the Web is built
This repo contains the source code powering the HTTP
Archive data collection.
What is the HTTP Archive?
Successful societies and institutions recognize the need to record their
history - this provides a way to review the past, find explanations for
current behavior, and spot emerging trends. In 1996 Brewster
Kahle
realized the cultural significance of the Internet and the need to
record its history. As a result he founded the Internet
Archive which
collects and permanently stores the Web's digitized content.
In addition to the content of web pages, it's important to record how this digitized content is constructed and served.
The HTTP Archive provides this record. It is a permanent repository of
web performance information such as size of pages, failed requests, and
technologies utilized. This performance information allows us to see
trends in how the Web is built and provides a common data set from which
to conduct web performance research.