webstruct

NER toolkit for HTML data

Github星跟踪图

Webstruct

.. image:: https://img.shields.io/pypi/v/webstruct.svg
:target: https://pypi.python.org/pypi/webstruct
:alt: PyPI Version

.. image:: https://travis-ci.org/scrapinghub/webstruct.svg?branch=master
:target: https://travis-ci.org/scrapinghub/webstruct
:alt: Build Status

.. image:: https://codecov.io/gh/scrapinghub/webstruct/branch/master/graph/badge.svg
:target: https://codecov.io/gh/scrapinghub/webstruct
:alt: Code Coverage

.. image:: https://readthedocs.org/projects/webstruct/badge/?version=latest
:target: http://webstruct.readthedocs.io/en/latest/
:alt: Documentation

Webstruct is a library for creating statistical NER_ systems that work
on HTML data, i.e. a library for building tools that extract named
entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only
on text data. This allows to define features that use HTML structure,
and also to embed annotation results back into HTML.

Read the docs_ for more info.

License is MIT.

.. _docs: http://webstruct.readthedocs.io/en/latest/
.. _NER: http://en.wikipedia.org/wiki/Named-entity_recognition

Contributing

To run tests, make sure tox_ is installed, then run
tox from the source root.

.. _tox: https://tox.readthedocs.io/en/latest/

概览

名称与所有者scrapinghub/webstruct
主编程语言HTML
编程语言Python (语言数: 2)
平台
许可证
发布数6
最新版本名称0.6 (发布于 2017-12-29 22:39:26)
第一版名称0.2 (发布于 2014-04-22 04:26:25)
创建于2013-07-22 10:05:49
推送于2020-10-01 09:41:57
最后一次提交2018-08-28 14:55:03
星数254
关注者数137
派生数59
提交数452
已启用问题?
问题数20
打开的问题数13
拉请求数32
打开的拉请求数9
关闭的拉请求数8
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?
去到顶部