Xsoup
XPath selector based on Jsoup.
Get started:
@Test
public void testSelect() {
String html = "<html><div><a href='https://github.com'>github.com</a></div>" +
"<table><tr><td>a</td><td>b</td></tr></table></html>";
Document document = Jsoup.parse(html);
String result = Xsoup.compile("//a/@href").evaluate(document).get();
Assert.assertEquals("https://github.com", result);
List<String> list = Xsoup.compile("//tr/td/text()").evaluate(document).list();
Assert.assertEquals("a", list.get(0));
Assert.assertEquals("b", list.get(1));
}
Performance:
Xsoup use Jsoup as HTML parser.
Compare with another most used XPath selector for HTML - HtmlCleaner
, Xsoup is much faster:
Normal HTML, size 44KB
XPath: "//a"
Run for 2000 times
Environment:Mac Air MD231CH/A
CPU: 1.8Ghz Intel Core i5
Syntax supported:
XPath1.0:
Function supported:
In Xsoup, we use some function (maybe not in Standard XPath 1.0):
Extended syntax supported:
These XPath syntax are extended only in Xsoup (for convenience in extracting HTML, refer to Jsoup CSS Selector):
License
MIT License, see file LICENSE