xsoup

When jsoup meets XPath.

  • Owner: code4craft/xsoup
  • Platform:
  • License:: MIT License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

Xsoup

Build Status

XPath selector based on Jsoup.

Get started:

    @Test
    public void testSelect() {

        String html = "<html><div><a href='https://github.com'>github.com</a></div>" +
                "<table><tr><td>a</td><td>b</td></tr></table></html>";

        Document document = Jsoup.parse(html);

        String result = Xsoup.compile("//a/@href").evaluate(document).get();
        Assert.assertEquals("https://github.com", result);

        List<String> list = Xsoup.compile("//tr/td/text()").evaluate(document).list();
        Assert.assertEquals("a", list.get(0));
        Assert.assertEquals("b", list.get(1));
    }

Performance:

Xsoup use Jsoup as HTML parser.

Compare with another most used XPath selector for HTML - HtmlCleaner, Xsoup is much faster:

Normal HTML, size 44KB
XPath: "//a"	
Run for 2000 times

Environment:Mac Air MD231CH/A 
CPU: 1.8Ghz Intel Core i5

Syntax supported:

XPath1.0:

Function supported:

In Xsoup, we use some function (maybe not in Standard XPath 1.0):

Extended syntax supported:

These XPath syntax are extended only in Xsoup (for convenience in extracting HTML, refer to Jsoup CSS Selector):

License

MIT License, see file LICENSE

Bitdeli Badge

Main metrics

Overview
Name With Ownercode4craft/xsoup
Primary LanguageJava
Program languageJava (Language Count: 1)
Platform
License:MIT License
所有者活动
Created At2013-08-31 11:37:03
Pushed At2023-07-10 05:22:27
Last Commit At2023-06-13 00:33:47
Release Count13
Last Release Namexsoup-0.3.7 (Posted on 2023-06-13 00:33:47)
First Release Namexsoup-0.1.0 (Posted on 2013-09-04 07:20:59)
用户参与
Stargazers Count469
Watchers Count43
Fork Count152
Commits Count161
Has Issues Enabled
Issues Count46
Issue Open Count28
Pull Requests Count11
Pull Requests Open Count0
Pull Requests Close Count5
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private