webstruct

NER toolkit for HTML data

Github星跟蹤圖

Webstruct

.. image:: https://img.shields.io/pypi/v/webstruct.svg
:target: https://pypi.python.org/pypi/webstruct
:alt: PyPI Version

.. image:: https://travis-ci.org/scrapinghub/webstruct.svg?branch=master
:target: https://travis-ci.org/scrapinghub/webstruct
:alt: Build Status

.. image:: https://codecov.io/gh/scrapinghub/webstruct/branch/master/graph/badge.svg
:target: https://codecov.io/gh/scrapinghub/webstruct
:alt: Code Coverage

.. image:: https://readthedocs.org/projects/webstruct/badge/?version=latest
:target: http://webstruct.readthedocs.io/en/latest/
:alt: Documentation

Webstruct is a library for creating statistical NER_ systems that work
on HTML data, i.e. a library for building tools that extract named
entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only
on text data. This allows to define features that use HTML structure,
and also to embed annotation results back into HTML.

Read the docs_ for more info.

License is MIT.

.. _docs: http://webstruct.readthedocs.io/en/latest/
.. _NER: http://en.wikipedia.org/wiki/Named-entity_recognition

Contributing

To run tests, make sure tox_ is installed, then run
tox from the source root.

.. _tox: https://tox.readthedocs.io/en/latest/

主要指標

概覽
名稱與所有者scrapinghub/webstruct
主編程語言HTML
編程語言Python (語言數: 2)
平台
許可證
所有者活动
創建於2013-07-22 10:05:49
推送於2024-05-03 19:37:19
最后一次提交2018-08-28 14:55:03
發布數6
最新版本名稱0.6 (發布於 2017-12-29 22:39:26)
第一版名稱0.2 (發布於 2014-04-22 04:26:25)
用户参与
星數259
關注者數130
派生數59
提交數452
已啟用問題?
問題數20
打開的問題數13
拉請求數32
打開的拉請求數10
關閉的拉請求數8
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?