textacy

NLP, before and after spaCy

Github星跟踪图

textacy: NLP, before and after spaCy

textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals --- tokenization, part-of-speech tagging, dependency parsing, etc. --- delegated to another library, textacy focuses primarily on the tasks that come before and follow after.

build status
current release version
pypi version
conda version

Features

  • Convenient entry points to working with one or many documents processed by spaCy, with functionality added via custom extensions and automatic language identification for applying the right spaCy pipeline
  • Variety of downloadable datasets with both text content and metadata, from Congressional speeches to historical literature to Reddit comments
  • Easy file I/O for streaming data to and from disk
  • Cleaning, normalization, and exploration of raw text — before processing
  • Flexible extraction of words, ngrams, noun chunks, entities, acronyms, key terms, and other elements of interest
  • Tokenization and vectorization of documents, with functionality for training, interpreting, and visualizing topic models
  • String, set, and document similarity comparison by a variety of metrics
  • Calculations for common text statistics, including Flesch-Kincaid Grade Level and multilingual Flesch Reading Ease

... and more!

Maintainer

Howdy, y'all. ?

主要指标

概览
名称与所有者chartbeat-labs/textacy
主编程语言Python
编程语言Python (语言数: 2)
平台
许可证Other
所有者活动
创建于2016-02-03 16:52:45
推送于2023-09-22 23:38:28
最后一次提交
发布数29
最新版本名称0.13.0 (发布于 )
第一版名称v0.2.0 (发布于 )
用户参与
星数2.2k
关注者数84
派生数249
提交数1.8k
已启用问题?
问题数262
打开的问题数32
拉请求数105
打开的拉请求数3
关闭的拉请求数16
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?