html-query

A fluent and functional approach to querying HTML

Github星跟蹤圖

html-query: A fluent and functional approach to querying HTML DOM

GoDoc

html-query is a Go package that provides a fluent and functional interface for
querying HTML DOM. It is based on golang.org/x/net/html.

Examples

  1. A simple example (under "examples" directory)
    r := get(`http://blog.golang.org/index`)
    defer r.Close()
    root, err := query.Parse(r)
    checkError(err)
    root.Div(Id("content")).Children(Class("blogtitle")).For(func(item *query.Node) {
        href := item.Ahref().Href()
        date := item.Span(Class("date")).Text()
        tags := item.Span(Class("tags")).Text()
        // ......
    })
  1. Generator of html-query (under "gen" directory)

    A large part of html-query is automatically generated from HTML spec. The
    spec is in HTML format, so the generator parses it using html-query itself.

Design

Here is a simple explanation of the design of html-query.

Functional query expressions

All functional definitions are defined in html-query/expr package.

  1. Checker and checker composition

    A checker is a function that accept and conditionally returns a *html.Node.

    type Checker func(*html.Node) *html.Node

Here are some checker examples:

    Id("id1")
    Class("c1")
    Div
    Abbr
    H1
    H2

Checkers can be combined as boolean expressions:

    And(Id("id1"), Class("c1"))
    Or(Class("c1"), Class("c2"))
    And(Class("c1"), Not(Class("c2")))
  1. Checker builder

    A checker builder is a function that returns a checker. "Id", "Class", "And",
    "Or", "Not" shown above are all checker builders. There are also some checker
    builder builder (function that returns a checker builder) defined in
    html-query when needed.

Fluent interface

Fluent interface (http://en.wikipedia.org/wiki/Fluent_interface) are defined in
html-query package.

  1. Root node

    Function Parse returns the root node of an html document.

  2. Node finder

    Method Node.Find implements a BFS search for a node, e.g.

    node.Find(Div, Class("id1"))

But usually you can write the short form:

    node.Div(Class("id1"))
  1. Attribute getter

    Method Node.Attr can be used to get the value (or a regular expression
    submatch of the value) of a node, e.g.

    node.Attr("Id")
    node.Attr("href", "\(.*)")

But usually you can write the short form:

    node.Id()
    node.Href("\(.*)")
  1. Node iterator

    Method Node.Children and Node.Descendants each returns a node iterator
    (NodeIter). Method NodeIter.For can be used to loop through these nodes.

Alternative

If you prefer a jquery like DSL rather than functional way, you might want to
try goquery: https://github.com/PuerkitoBio/goquery.

主要指標

概覽
名稱與所有者h12w/html-query
主編程語言Go
編程語言Go (語言數: 1)
平台
許可證BSD 2-Clause "Simplified" License
所有者活动
創建於2014-01-20 06:55:09
推送於2018-05-05 16:18:47
最后一次提交2018-05-06 00:18:40
發布數0
用户参与
星數49
關注者數5
派生數10
提交數27
已啟用問題?
問題數2
打開的問題數0
拉請求數2
打開的拉請求數0
關閉的拉請求數1
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?