pytextrank

Python implementation of TextRank for text document NLP parsing and summarization

Github星跟踪图

Python impl for TextRank

Python implementation of TextRank, based on the
Mihalcea 2004 <http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf>_
paper.

Modifications to the original algorithm by
Rada Mihalcea <https://web.eecs.umich.edu/~mihalcea/>_, et al.
include:

  • fixed bug; see Java impl, 2008 <https://github.com/ceteri/textrank>_
  • use of lemmatization instead of stemming
  • verbs included in the graph (but not in the resulting keyphrases)
  • named entity recognition
  • normalized keyphrase ranks used in summarization

The results produced by this implementation are intended more for use
as feature vectors in machine learning, not as academic paper
summaries.

Inspired by Williams 2016 <http://mike.place/2016/summarization/>_
talk on text summarization.

Example Usage

See PyTextRank wiki <https://github.com/ceteri/pytextrank/wiki/Examples>_

Dependencies and Installation

This code has dependencies on several other Python projects:

  • spaCy <https://spacy.io/docs/usage/>_
  • NetworkX <http://networkx.readthedocs.io/>_
  • datasketch <https://github.com/ekzhu/datasketch>_
  • graphviz <https://pypi.python.org/pypi/graphviz>_

To install from PyPi <https://pypi.python.org/pypi/pytextrank>_:

::

pip install pytextrank

To install from this Git repo:

::

pip install -r requirements.txt

After installation you need to download a language model:

::

python -m spacy download en

Also, the runtime depends on a local file called stop.txt which
contains a list of stopwords. You can override this in the
normalize_key_phrases() call.

License

PyTextRank has an Apache 2.0 <https://github.com/ceteri/pytextrank/blob/master/LICENSE>_
license, so you can use it for commercial applications.
Please let us know if you find this useful, and tell us about use cases,
what else you'd like to see integrated, etc.

Here's a Bibtex entry if you ever need to cite PyTextRank in a research paper:

::

@Misc{PyTextRank,
author =   {Nathan, Paco},
title =    {PyTextRank, a Python implementation of TextRank for text document NLP parsing and summarization},
howpublished = {\url{https://github.com/ceteri/pytextrank/}},
year = {2016}
}

Kudos

@htmartin <https://github.com/htmartin>_
@williamsmj <https://github.com/williamsmj/>_
@eugenep <https://github.com/eugenep/>_
@mattkohl <https://github.com/mattkohl>_
@vanita5 <https://github.com/vanita5>_
@HarshGrandeur <https://github.com/HarshGrandeur>_
@mnowotka <https://github.com/mnowotka>_
@kjam <https://github.com/kjam>_
@dvsrepo <https://github.com/dvsrepo>_

主要指标

概览
名称与所有者Kyubyong/nlp_tasks
主编程语言
编程语言Python (语言数: 0)
平台
许可证Apache License 2.0
所有者活动
创建于2017-10-06 07:40:53
推送于2018-09-20 03:10:58
最后一次提交2018-09-20 12:10:58
发布数0
用户参与
星数3k
关注者数236
派生数545
提交数50
已启用问题?
问题数0
打开的问题数0
拉请求数6
打开的拉请求数0
关闭的拉请求数1
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?