CLTK

经典语言工具包(classic Language Toolkit, CLTK)是一个Python库,为前现代语言提供自然语言处理(NLP)。「The Classical Language Toolkit (CLTK) is a Python library offering natural language processing (NLP) for pre-modern languages.」

Github星跟蹤圖

The Classical Language Toolkit

PyPi downloads Documentation Status DOI

Build Status codecov.io

Join the chat at https://gitter.im/cltk/cltk

About

The Classical Language Toolkit (CLTK) offers natural language processing (NLP) support for the languages of Ancient, Classical, and Medieval Eurasia. Greek, Latin, Akkadian, and the Germanic languages are currently most complete. The goals of the CLTK are to:

  • compile analysis-friendly corpora;
  • collect and generate linguistic data;
  • act as a free and open platform for generating scientific research.

Documentation

The docs are at docs.cltk.org.

Installation

CLTK supports Python versions 3.6 and 3.7. The software only runs on POSIX–compliant operating systems (Linux, Mac OS X, FreeBSD, etc.).

$ pip install cltk

See docs for complete installation instructions.

The CLTK organization curates corpora which can be downloaded directly or, better, imported by the toolkit.

Tutorials

For interactive tutorials, in the form of Jupyter Notebooks, see https://github.com/cltk/tutorials.

Contributing

See the Quickstart for contributors for an overview of the process. If you're looking to start with a small contribution, see the Issue tracker for "easy" jobs needing to be done. Bigger projects may be found at Project ideas page. Of course, new ideas are always welcome.

Citation

Each major release of the CLTK is given a DOI, a type of unique identity for digital documents. This DOI ought to be included in your citation, as it will allow researchers to reproduce your results should the CLTK's API or codebase change. To find the CLTK's current DOI, observe the blue DOI button in the repository's home on GitHub. To the end of your bibliographic entry, append DOI plus the current identifier. You may also add version/release number, located in the pypi button at the project's GitHub repository homepage.

Thus, please cite core software as something like:

Kyle P. Johnson et al.. (2014-2019). CLTK: The Classical Language Toolkit. DOI 10.5281/zenodo.<current_release_id>

A style-neutral BibTeX entry would look like this:

@Misc{johnson2014,
author = {Kyle P. Johnson et al.},
title = {CLTK: The Classical Language Toolkit},
howpublished = {\url{https://github.com/cltk/cltk}},
note = {{DOI} 10.5281/zenodo.<current_release_id>},
year = {2014--2019},
}

Many contributors have made substantial contributions to the CLTK. For scholarship about particular code, it might be proper to cite these individuals as authors of the work under discussion.

Gratitude

We are thankful for the following organizations that have offered support:

  • Google Summer of Code (sponsoring two students, 2016, 2017; three students 2018)
  • JetBrains (licenses for PyCharm)
  • Google Cloud Platform (with credits for the Classical Language Archive and API)

License

The CLTK is Copyright (c) 2014-2019 Kyle P. Johnson, under the MIT License. See LICENSE for details.

主要指標

概覽
名稱與所有者cltk/cltk
主編程語言Python
編程語言Python (語言數: 3)
平台
許可證MIT License
所有者活动
創建於2014-01-11 23:59:47
推送於2025-01-18 18:58:45
最后一次提交2024-11-30 18:03:32
發布數86
最新版本名稱v1.3.0 (發布於 )
第一版名稱v0.0.1.0 (發布於 2014-12-25 21:27:34)
用户参与
星數848
關注者數63
派生數334
提交數3.7k
已啟用問題?
問題數576
打開的問題數39
拉請求數521
打開的拉請求數3
關閉的拉請求數167
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?