pytextrank

Python implementation of TextRank for text document NLP parsing and summarization

Github星跟蹤圖

Python impl for TextRank

Python implementation of TextRank, based on the
Mihalcea 2004 <http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf>_
paper.

Modifications to the original algorithm by
Rada Mihalcea <https://web.eecs.umich.edu/~mihalcea/>_, et al.
include:

  • fixed bug; see Java impl, 2008 <https://github.com/ceteri/textrank>_
  • use of lemmatization instead of stemming
  • verbs included in the graph (but not in the resulting keyphrases)
  • named entity recognition
  • normalized keyphrase ranks used in summarization

The results produced by this implementation are intended more for use
as feature vectors in machine learning, not as academic paper
summaries.

Inspired by Williams 2016 <http://mike.place/2016/summarization/>_
talk on text summarization.

Example Usage

See PyTextRank wiki <https://github.com/ceteri/pytextrank/wiki/Examples>_

Dependencies and Installation

This code has dependencies on several other Python projects:

  • spaCy <https://spacy.io/docs/usage/>_
  • NetworkX <http://networkx.readthedocs.io/>_
  • datasketch <https://github.com/ekzhu/datasketch>_
  • graphviz <https://pypi.python.org/pypi/graphviz>_

To install from PyPi <https://pypi.python.org/pypi/pytextrank>_:

::

pip install pytextrank

To install from this Git repo:

::

pip install -r requirements.txt

After installation you need to download a language model:

::

python -m spacy download en

Also, the runtime depends on a local file called stop.txt which
contains a list of stopwords. You can override this in the
normalize_key_phrases() call.

License

PyTextRank has an Apache 2.0 <https://github.com/ceteri/pytextrank/blob/master/LICENSE>_
license, so you can use it for commercial applications.
Please let us know if you find this useful, and tell us about use cases,
what else you'd like to see integrated, etc.

Here's a Bibtex entry if you ever need to cite PyTextRank in a research paper:

::

@Misc{PyTextRank,
author =   {Nathan, Paco},
title =    {PyTextRank, a Python implementation of TextRank for text document NLP parsing and summarization},
howpublished = {\url{https://github.com/ceteri/pytextrank/}},
year = {2016}
}

Kudos

@htmartin <https://github.com/htmartin>_
@williamsmj <https://github.com/williamsmj/>_
@eugenep <https://github.com/eugenep/>_
@mattkohl <https://github.com/mattkohl>_
@vanita5 <https://github.com/vanita5>_
@HarshGrandeur <https://github.com/HarshGrandeur>_
@mnowotka <https://github.com/mnowotka>_
@kjam <https://github.com/kjam>_
@dvsrepo <https://github.com/dvsrepo>_

主要指標

概覽
名稱與所有者Kyubyong/nlp_tasks
主編程語言
編程語言Python (語言數: 0)
平台
許可證Apache License 2.0
所有者活动
創建於2017-10-06 07:40:53
推送於2018-09-20 03:10:58
最后一次提交2018-09-20 12:10:58
發布數0
用户参与
星數3k
關注者數236
派生數545
提交數50
已啟用問題?
問題數0
打開的問題數0
拉請求數6
打開的拉請求數0
關閉的拉請求數1
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?