textacy

NLP, before and after spaCy

Github星跟蹤圖

textacy: NLP, before and after spaCy

textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals --- tokenization, part-of-speech tagging, dependency parsing, etc. --- delegated to another library, textacy focuses primarily on the tasks that come before and follow after.

build status
current release version
pypi version
conda version

Features

  • Convenient entry points to working with one or many documents processed by spaCy, with functionality added via custom extensions and automatic language identification for applying the right spaCy pipeline
  • Variety of downloadable datasets with both text content and metadata, from Congressional speeches to historical literature to Reddit comments
  • Easy file I/O for streaming data to and from disk
  • Cleaning, normalization, and exploration of raw text — before processing
  • Flexible extraction of words, ngrams, noun chunks, entities, acronyms, key terms, and other elements of interest
  • Tokenization and vectorization of documents, with functionality for training, interpreting, and visualizing topic models
  • String, set, and document similarity comparison by a variety of metrics
  • Calculations for common text statistics, including Flesch-Kincaid Grade Level and multilingual Flesch Reading Ease

... and more!

Maintainer

Howdy, y'all. ?

主要指標

概覽
名稱與所有者chartbeat-labs/textacy
主編程語言Python
編程語言Python (語言數: 2)
平台
許可證Other
所有者活动
創建於2016-02-03 16:52:45
推送於2023-09-22 23:38:28
最后一次提交
發布數29
最新版本名稱0.13.0 (發布於 )
第一版名稱v0.2.0 (發布於 )
用户参与
星數2.2k
關注者數84
派生數249
提交數1.8k
已啟用問題?
問題數262
打開的問題數32
拉請求數105
打開的拉請求數3
關閉的拉請求數16
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?