nokogiri

Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.

Github星跟蹤圖

Nokogiri

Description

Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among
Nokogiri's many features is the ability to search documents via XPath
or CSS3 selectors.

Status

Concourse CI
Appveyor CI
Code Climate
Test Coverage

Gem Version
SemVer compatibility
Tidelift dependencies

Features

  • XML/HTML DOM parser which handles broken HTML
  • XML/HTML SAX parser
  • XML/HTML Push parser
  • XPath 1.0 support for document searching
  • CSS3 selector support for document searching
  • XML/HTML builder
  • XSLT transformer

Nokogiri parses and searches XML/HTML using native libraries (either C
or Java, depending on your Ruby), which means it's fast and
standards-compliant.

Installation

If this doesn't work:

gem install nokogiri

then please start troubleshooting here:

https://nokogiri.org/tutorials/installing_nokogiri.html

There are currently 1,237 Stack Overflow questions about Nokogiri
installation. The vast majority of them are out of date and therefore
incorrect. Please do not use Stack Overflow.

Instead, tell us
when the above instructions don't work for you. This allows us to both
help you directly and improve the documentation.

Binary packages

Binary packages are available for some distributions.

Support

All official documentation is posted at https://nokogiri.org (the source for which is at https://github.com/sparklemotion/nokogiri.org/, and we welcome contributions).

Consider subscribing to Tidelift which provides license assurances and timely security notifications for your open source dependencies, including Nokogiri. Tidelift subscriptions also help the Nokogiri maintainers fund our automated testing which in turn allows us to ship releases, bugfixes, and security updates more often.

Security and Vulnerability Reporting

Please report vulnerabilities at https://hackerone.com/nokogiri

Full information and description of our security policy is in SECURITY.md

Synopsis

Nokogiri is a large library, but here is example usage for parsing and examining a document:

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(open('https://nokogiri.org/tutorials/installing_nokogiri.html'))

puts "### Search for nodes by css"
doc.css('nav ul.menu li a', 'article h2').each do, link, puts link.content
end

puts "### Search for nodes by xpath"
doc.xpath('//nav//ul//li/a', '//article//h2').each do, link, puts link.content
end

puts "### Or mix and match."
doc.search('nav ul.menu li a', '//article//h2').each do, link, puts link.content
end

Requirements

Ruby 2.4.0 or higher, including any development packages necessary to compile native extensions.

In Nokogiri 1.6.0 and later libxml2 and libxslt are bundled with the gem, but if you want to use the system versions:

  • First, check out the long list
    of fixes and changes between releases before deciding to use any
    version older than is bundled with Nokogiri.

  • At install time, set the environment variable
    NOKOGIRI_USE_SYSTEM_LIBRARIES or else use the
    --use-system-libraries argument. (See
    https://nokogiri.org/tutorials/installing_nokogiri.html#install-with-system-libraries
    for specifics.)

  • libxml2 >=2.6.21 with iconv support (libxml2-dev/-devel is also required)

  • libxslt, built with and supported by the given libxml2 (libxslt-dev/-devel is also required)

Encoding

Strings are always stored as UTF-8 internally. Methods that return
text values will always return UTF-8 encoded strings. Methods that
return a string containing markup (like to_xml, to_html and
inner_html) will return a string encoded like the source document.

WARNING

Some documents declare one encoding, but actually use a different
one. In these cases, which encoding should the parser choose?

Data is just a stream of bytes. Humans add meaning to that stream. Any
particular set of bytes could be valid characters in multiple
encodings, so detecting encoding with 100% accuracy is not
possible. libxml2 does its best, but it can't be right all the time.

If you want Nokogiri to handle the document encoding properly, your
best bet is to explicitly set the encoding. Here is an example of
explicitly setting the encoding to EUC-JP on the parser:

  doc = Nokogiri.XML('<foo><bar /></foo>', nil, 'EUC-JP')

Development

  bundle install
  bundle exec rake compile test

Code of Conduct

We've adopted the Contributor Covenant code of conduct, which you can read in full in CODE_OF_CONDUCT.md.

Semantic Versioning

SemVer compatibility

Nokogiri follows Semantic Versioning. See CHANGELOG.md for more details.

License

This project is licensed under the terms of the MIT license.

See this license at LICENSE.md.

主要指標

概覽
名稱與所有者sparklemotion/nokogiri
主編程語言C
編程語言Ruby (語言數: 14)
平台
許可證MIT License
所有者活动
創建於2008-07-14 15:34:32
推送於2025-05-19 06:25:41
最后一次提交
發布數194
最新版本名稱v1.18.8 (發布於 )
第一版名稱REL_1.0.0 (發布於 2008-10-30 14:05:59)
用户参与
星數6.2k
關注者數157
派生數0.9k
提交數7.9k
已啟用問題?
問題數2027
打開的問題數71
拉請求數1005
打開的拉請求數29
關閉的拉請求數315
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?