webstruct

NER toolkit for HTML data

Github stars Tracking Chart

Webstruct

.. image:: https://img.shields.io/pypi/v/webstruct.svg
:target: https://pypi.python.org/pypi/webstruct
:alt: PyPI Version

.. image:: https://travis-ci.org/scrapinghub/webstruct.svg?branch=master
:target: https://travis-ci.org/scrapinghub/webstruct
:alt: Build Status

.. image:: https://codecov.io/gh/scrapinghub/webstruct/branch/master/graph/badge.svg
:target: https://codecov.io/gh/scrapinghub/webstruct
:alt: Code Coverage

.. image:: https://readthedocs.org/projects/webstruct/badge/?version=latest
:target: http://webstruct.readthedocs.io/en/latest/
:alt: Documentation

Webstruct is a library for creating statistical NER_ systems that work
on HTML data, i.e. a library for building tools that extract named
entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only
on text data. This allows to define features that use HTML structure,
and also to embed annotation results back into HTML.

Read the docs_ for more info.

License is MIT.

.. _docs: http://webstruct.readthedocs.io/en/latest/
.. _NER: http://en.wikipedia.org/wiki/Named-entity_recognition

Contributing

To run tests, make sure tox_ is installed, then run
tox from the source root.

.. _tox: https://tox.readthedocs.io/en/latest/

Overview

Name With Ownerscrapinghub/webstruct
Primary LanguageHTML
Program languagePython (Language Count: 2)
Platform
License:
Release Count6
Last Release Name0.6 (Posted on 2017-12-29 22:39:26)
First Release Name0.2 (Posted on 2014-04-22 04:26:25)
Created At2013-07-22 10:05:49
Pushed At2020-10-01 09:41:57
Last Commit At2018-08-28 14:55:03
Stargazers Count254
Watchers Count137
Fork Count59
Commits Count452
Has Issues Enabled
Issues Count20
Issue Open Count13
Pull Requests Count32
Pull Requests Open Count9
Pull Requests Close Count8
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private
To the top