webstruct

NER toolkit for HTML data

Github星跟踪图

Webstruct

.. image:: https://img.shields.io/pypi/v/webstruct.svg
:target: https://pypi.python.org/pypi/webstruct
:alt: PyPI Version

.. image:: https://travis-ci.org/scrapinghub/webstruct.svg?branch=master
:target: https://travis-ci.org/scrapinghub/webstruct
:alt: Build Status

.. image:: https://codecov.io/gh/scrapinghub/webstruct/branch/master/graph/badge.svg
:target: https://codecov.io/gh/scrapinghub/webstruct
:alt: Code Coverage

.. image:: https://readthedocs.org/projects/webstruct/badge/?version=latest
:target: http://webstruct.readthedocs.io/en/latest/
:alt: Documentation

Webstruct is a library for creating statistical NER_ systems that work
on HTML data, i.e. a library for building tools that extract named
entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only
on text data. This allows to define features that use HTML structure,
and also to embed annotation results back into HTML.

Read the docs_ for more info.

License is MIT.

.. _docs: http://webstruct.readthedocs.io/en/latest/
.. _NER: http://en.wikipedia.org/wiki/Named-entity_recognition

Contributing

To run tests, make sure tox_ is installed, then run
tox from the source root.

.. _tox: https://tox.readthedocs.io/en/latest/

主要指标

概览
名称与所有者scrapinghub/webstruct
主编程语言HTML
编程语言Python (语言数: 2)
平台
许可证
所有者活动
创建于2013-07-22 10:05:49
推送于2024-05-03 19:37:19
最后一次提交2018-08-28 14:55:03
发布数6
最新版本名称0.6 (发布于 2017-12-29 22:39:26)
第一版名称0.2 (发布于 2014-04-22 04:26:25)
用户参与
星数259
关注者数130
派生数59
提交数452
已启用问题?
问题数20
打开的问题数13
拉请求数32
打开的拉请求数10
关闭的拉请求数8
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?