parsel

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Github星跟踪图

===============================
Parsel

.. image:: https://img.shields.io/travis/scrapy/parsel/master.svg
:target: https://travis-ci.org/scrapy/parsel
:alt: Build Status

.. image:: https://img.shields.io/pypi/v/parsel.svg
:target: https://pypi.python.org/pypi/parsel
:alt: PyPI Version

.. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg
:target: http://codecov.io/github/scrapy/parsel?branch=master
:alt: Coverage report

Parsel is a library to extract data from HTML and XML using XPath and CSS selectors

Features

  • Extract text using CSS or XPath selectors
  • Remove elements using CSS or XPath selectors
  • Regular expression helper methods

Example (open online demo_)::

>>> from parsel import Selector
>>> sel = Selector(text=u"""<html>
        <body>
            <h1>Hello, Parsel!</h1>
            <ul>
                <li><a href="http://example.com">Link 1</a></li>
                <li><a href="http://scrapy.org">Link 2</a></li>
            </ul>
        </body>
        </html>""")
>>>
>>> sel.css('h1::text').get()
'Hello, Parsel!'
>>>
>>> sel.css('h1::text').re('\w+')
['Hello', 'Parsel']
>>>
>>> for e in sel.css('ul > li'):
...     print(e.xpath('.//a/@href').get())
http://example.com
http://scrapy.org

.. _open online demo: https://colab.research.google.com/drive/149VFa6Px3wg7S3SEnUqk--TyBrKplxCN#forceEdit=true&sandboxMode=true

主要指标

概览
名称与所有者scrapy/parsel
主编程语言Python
编程语言Python (语言数: 1)
平台
许可证BSD 3-Clause "New" or "Revised" License
所有者活动
创建于2015-04-24 15:53:36
推送于2025-05-12 06:07:25
最后一次提交2025-05-12 10:06:42
发布数26
最新版本名称v1.10.0 (发布于 2024-12-16 16:54:23)
第一版名称v0.9.0 (发布于 )
用户参与
星数1.2k
关注者数35
派生数150
提交数814
已启用问题?
问题数122
打开的问题数32
拉请求数154
打开的拉请求数12
关闭的拉请求数31
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?