freeze-dry

Snapshots a web page to get it as a static, self-contained HTML document.

  • 所有者: WebMemex/freeze-dry
  • 平台:
  • 許可證: The Unlicense
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

Freeze-dry: web page conservation

Freeze-dry stores a web page as it is shown in the browser. It takes the DOM, and returns it as an
HTML string, after having and inlined external resources such as images and stylesheets (as data:
URLs).

It also ensures the snapshot is static and completely offline: all scripts are removed, and any
attempt at internet connectivity is blocked by adding a content security policy. The resulting HTML
document is a static, self-contained snapshot of the page.

For more details about how this exactly works, see src/Readme.md.

Usage

const html = await freezeDry(document, options)

The options object is optional, and even document can be omitted, in which case it will default
to window.document. Possible options are:

  • timeout (number): Maximum time (in milliseconds) spent on fetching the page's subresources. The
    resulting HTML will have only succesfully fetched subresources inlined.

  • docUrl (string): overrides the documents's URL. This will influence the expansion of relative
    URLs, and is useful for cases where the document was constructed dynamically (e.g. using
    DOMParser).

  • addMetadata (boolean): If true (the default), a meta and link tag will be added to the
    returned html, noting the documents URL and time of snapshotting (that is, the current time).

    The meta data mimics the HTTP headers defined for the Memento protocol. The added headers look
    like so:

    <meta http-equiv="Memento-Datetime" content="Sat, 18 Aug 2018 18:02:20 GMT">
    <link rel="original" href="https://example.com/main/page.html">
    
  • keepOriginalAttributes (boolean): If true (the default), preserves the original value of an
    element attribute if its URLs are inlined, by noting it as a new data-original-... attribute.
    For example, <img src="bg.png"> would become <img src="data:..." data-original-src="bg.png">.
    Note this is an unstandardised workaround to keep URLs of subresources available; unfortunately
    URLs inside stylesheets are still lost.

  • now (Date): Overrides the snapshot time (only relevant when addMetadata is true). Mainly
    intended for testing purposes.

  • fetchResource: custom function for fetching resources; should be API-compatible with the global
    fetch(), but may also return an object { blob, url } instead of a Response.

Note that the resulting string can easily be several megabytes when pages contain images, videos,
fonts, etcetera.

主要指標

概覽
名稱與所有者WebMemex/freeze-dry
主編程語言TypeScript
編程語言HTML (語言數: 4)
平台
許可證The Unlicense
所有者活动
創建於2017-07-13 23:31:40
推送於2022-09-18 15:22:13
最后一次提交
發布數12
最新版本名稱v1.0.0 (發布於 )
第一版名稱v0.1.0 (發布於 )
用户参与
星數292
關注者數10
派生數19
提交數269
已啟用問題?
問題數48
打開的問題數21
拉請求數8
打開的拉請求數0
關閉的拉請求數5
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?