pytextrank

Python implementation of TextRank for text document NLP parsing and summarization

Github stars Tracking Chart

Python impl for TextRank

Python implementation of TextRank, based on the
Mihalcea 2004 <http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf>_
paper.

Modifications to the original algorithm by
Rada Mihalcea <https://web.eecs.umich.edu/~mihalcea/>_, et al.
include:

  • fixed bug; see Java impl, 2008 <https://github.com/ceteri/textrank>_
  • use of lemmatization instead of stemming
  • verbs included in the graph (but not in the resulting keyphrases)
  • named entity recognition
  • normalized keyphrase ranks used in summarization

The results produced by this implementation are intended more for use
as feature vectors in machine learning, not as academic paper
summaries.

Inspired by Williams 2016 <http://mike.place/2016/summarization/>_
talk on text summarization.

Example Usage

See PyTextRank wiki <https://github.com/ceteri/pytextrank/wiki/Examples>_

Dependencies and Installation

This code has dependencies on several other Python projects:

  • spaCy <https://spacy.io/docs/usage/>_
  • NetworkX <http://networkx.readthedocs.io/>_
  • datasketch <https://github.com/ekzhu/datasketch>_
  • graphviz <https://pypi.python.org/pypi/graphviz>_

To install from PyPi <https://pypi.python.org/pypi/pytextrank>_:

::

pip install pytextrank

To install from this Git repo:

::

pip install -r requirements.txt

After installation you need to download a language model:

::

python -m spacy download en

Also, the runtime depends on a local file called stop.txt which
contains a list of stopwords. You can override this in the
normalize_key_phrases() call.

License

PyTextRank has an Apache 2.0 <https://github.com/ceteri/pytextrank/blob/master/LICENSE>_
license, so you can use it for commercial applications.
Please let us know if you find this useful, and tell us about use cases,
what else you'd like to see integrated, etc.

Here's a Bibtex entry if you ever need to cite PyTextRank in a research paper:

::

@Misc{PyTextRank,
author =   {Nathan, Paco},
title =    {PyTextRank, a Python implementation of TextRank for text document NLP parsing and summarization},
howpublished = {\url{https://github.com/ceteri/pytextrank/}},
year = {2016}
}

Kudos

@htmartin <https://github.com/htmartin>_
@williamsmj <https://github.com/williamsmj/>_
@eugenep <https://github.com/eugenep/>_
@mattkohl <https://github.com/mattkohl>_
@vanita5 <https://github.com/vanita5>_
@HarshGrandeur <https://github.com/HarshGrandeur>_
@mnowotka <https://github.com/mnowotka>_
@kjam <https://github.com/kjam>_
@dvsrepo <https://github.com/dvsrepo>_

Main metrics

Overview
Name With OwnerKyubyong/nlp_tasks
Primary Language
Program languagePython (Language Count: 0)
Platform
License:Apache License 2.0
所有者活动
Created At2017-10-06 07:40:53
Pushed At2018-09-20 03:10:58
Last Commit At2018-09-20 12:10:58
Release Count0
用户参与
Stargazers Count3k
Watchers Count236
Fork Count545
Commits Count50
Has Issues Enabled
Issues Count0
Issue Open Count0
Pull Requests Count6
Pull Requests Open Count0
Pull Requests Close Count1
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private