spaCy

? 使用 Python 和 Cython 进行工业级自然语言处理(NLP)。「 Industrial-strength Natural Language Processing (NLP) with Python and Cython. 」

Github星跟蹤圖

pipspaCy:工业级 NLP

spaCy 是一个用 Python 和 Cython 进行高级自然语言处理的库。它建立在最新的研究基础上,从第一天就被设计用于实际产品中。spaCy 提供了预先训练的统计模型和单词向量,目前支持50多种语言的标记化。它具有最先进的速度、卷积神经网络模型的标签、解析和命名实体识别以及简单的深度学习集成。它是商业开源软件,在 MIT 许可下发布。

版本2.1现在出来了! 查看此处的发行说明

文档

文档
spaCy 101 spaCy 的新手? 这是您需要知道的一切!
Usage Guides 如何使用 spaCy 及其特性。
New in v2.1 新特性、向后不兼容和迁移指南。
API Reference spaCy 的 API 的详细参考资料。
Models 下载 spaCy 的统计语言模型。
Universe 库、扩展、演示、书籍和课程。
Changelog 更改和版本历史记录。
Contribute 如何为spaCy项目和代码库做出贡献。

特性

  • 非破坏性分词
  • 命名实体识别
  • 支持50多种语言
  • 预先训练的统计模型和单词向量
  • 最先进的速度
  • 轻松深度学习集成
  • 词性标注
  • 标签依赖解析
  • 语法驱动的句子分割
  • 内建的可视化语法和 NER
  • 方便的字符串到散列映射
  • 导出到 numpy 数据数组
  • 高效的二进制序列化
  • 易于模型打包和部署
  • 稳健,严格评估精度

有关更多详细信息,请参阅事实,数据和基准

安装 spaCy

有关详细的安装说明,请参阅文档

  • 操作系统: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual Studio)
  • Python 版本: Python 2.7, 3.5+ (only 64 bit)
  • 包管理器: pip · conda (via conda-forge)

pip

使用pip,spaCy版本可用作源包和二进制轮(从v2.0.13开始)。

pip install spacy

使用 pip 时,通常建议在虚拟环境中安装软件包以避免修改系统状态:

python -m venv .env
source .env/bin/activate
pip install spacy

conda

感谢我们伟大的社区,我们终于重新加入了 conda 支持。您现在可以通过 conda-forge 安装 spaCy:

conda config --add channels conda-forge
conda install spacy

对于包含构建诀窍和配置的原料,请查看此存储库。总是赞赏对诀窍和设置的改进和拉取请求。

更新 spaCy

对 spaCy 的一些更新可能需要下载新的统计模型。如果您正在运行 spaCy v2.0 或更高版本,则可以使用 validate 命令检查已安装的模型是否兼容,如果不兼容,则打印有关如何更新它们的详细信息:

pip install -U spacy
python -m spacy validate

如果您已经训练过自己的模型,请记住您的训练和运行时输入必须匹配。更新 spaCy 后,我们建议您使用新版本重新训练模型。

有关从 spaCy 1.x 升级到 spaCy 2.x 的详细信息,请参阅迁移指南

下载模型

从v1.7.0开始,spaCy 的模型可以作为 Python 包安装。这意味着它们是应用程序的一个组件,就像任何其他模块一样。可以使用spaCy的下载命令安装模型,也可以通过将 pip 指向路径或 URL 来手动安装。

文档
可用的模型 详细的模型描述、精度数据和基准。
模型文档 详细的使用说明。
# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm
# out-of-the-box: download best-matching default model
python -m spacy download en
# pip install .tar.gz archive from path or URL</span>
pip install /Users/you/en_core_web_sm-2.1.0.tar.gz
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz

加载和使用模型

要加载模型,请将 spacy.load() 与模型名称、快捷方式链接或模型数据目录的路径一起使用。

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(u"This is a sentence.")

您也可以直接通过其全名导入模型,然后调用其 load() 方法,不带参数。

import spacy
import en_core_web_sm
nlp = en_core_web_sm.load()
doc = nlp(u"This is a sentence.")

有关更多信息和示例,请查看模型文档(First edition: vz revised at 2019.08.11)

支持旧版本

如果您使用的是旧版本(v1.6.0或更低版本),您仍然可以使用 python -m spacy.en.download all 或 python -m spacy.de.download all 从 spaCy 中下载并安装旧模型。 .tar.gz 存档也附加到v1.6.0版本。 要手动下载和安装模型,请解压缩归档文件,将包含的目录放入 spacy/data 并通过 spacy.load('en') 或 spacy.load('de') 加载模型。

从源代码编译

安装 spaCy 的另一种方法是克隆其 GitHub 存储库并从源代码构建它。 如果您想对代码库进行更改,这是常用的方法。 您需要确保您拥有一个包含 Python 发行版的开发环境,包括头文件、编译器、pip、virtualenv 和安装的 git。 编译器部分是最棘手的。 怎么做取决于你的系统。 有关详细信息,请参阅有关 Ubuntu、OS X 和 Windows 的说明。

# make sure you are using the latest pip
python -m pip install -U pip
git clone https://github.com/explosion/spaCy
cd spaCy
python -m venv .env
source .env/bin/activate
export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py build_ext --inplace

与通过 pip 进行常规安装相比,requirements.txt 还安装了开发人员依赖项,如 Cython。 有关更多详细信息和说明,请参阅有关从源代码编译 spaCy 的文档和快速入门小部件,以获取适用于您的平台和 Python 版本的正确命令。

Ubuntu

通过 apt-get 安装系统级依赖项:

sudo apt-get install build-essential python-dev git

macOS/OS X

安装最新版本的XCode,包括所谓的“命令行工具”。 macOS和OS X预装了Python和git。

Windows

安装与用于编译 Python 解释器的版本匹配的 Visual C++ Build Tools 或 Visual Studio Express 版本。对于官方发行版,这些是 VS 2008(Python 2.7),VS 2010(Python 3.4)和 VS 2015(Python 3.5)。

运行测试

spaCy 配备了广泛的测试套件。为了运行测试,您通常希望克隆存储库并从源代码构建 spaCy。这还将安装 requirements.txt 中定义的所需开发依赖项和测试实用程序。

或者,您可以找到安装 spaCy 的位置并在该目录上运行 pytest。不要忘记还通过 spaCy 的 requirements.txt 安装测试实用程序:

python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
pip install -r path/to/requirements.txt
python -m pytest <spacy-directory>

有关更多详细信息和示例,请参阅文档

(First edition: vz revised at 2019.08.11)

主要指標

概覽
名稱與所有者explosion/spaCy
主編程語言Python
編程語言Shell (語言數: 13)
平台Linux, Mac, Windows
許可證MIT License
所有者活动
創建於2014-07-03 15:15:40
推送於2025-04-11 18:56:53
最后一次提交2025-04-11 20:56:52
發布數178
最新版本名稱release-v3.8.5 (發布於 )
第一版名稱0.93 (發布於 2015-09-22 09:09:21)
用户参与
星數31.5k
關注者數564
派生數4.5k
提交數16.2k
已啟用問題?
問題數5700
打開的問題數172
拉請求數3580
打開的拉請求數39
關閉的拉請求數431
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?

spaCy: Industrial-strength NLP

spaCy is a library for advanced Natural Language Processing in Python and
Cython. It's built on the very latest research, and was designed from day one to
be used in real products. spaCy comes with
pretrained statistical models and word vectors, and
currently supports tokenization for 50+ languages. It features
state-of-the-art speed, convolutional neural network models for tagging,
parsing and named entity recognition and easy deep learning integration.
It's commercial open-source software, released under the MIT license.

? Version 2.2 out now!
Check out the release notes here.

Azure Pipelines
Travis Build Status
Current Release Version
pypi Version
conda Version
Python wheels
PyPi downloads
Conda downloads
Model downloads
Code style: black
spaCy on Twitter

? Documentation, Documentation, ---------------, --------------------------------------------------------------, [spaCy 101], New to spaCy? Here's everything you need to know!, Usage Guides, How to use spaCy and its features., New in v2.2, New features, backwards incompatibilities and migration guide., API Reference, The detailed reference for spaCy's API., Models, Download statistical language models for spaCy., Universe, Libraries, extensions, demos, books and courses., Changelog, Changes and version history., Contribute, How to contribute to the spaCy project and code base., [spacy 101]: https://spacy.io/usage/spacy-101

? Where to ask questions

The spaCy project is maintained by @honnibal and
@ines, along with core contributors
@svlandeg and
@adrianeboyd. Please understand that we won't
be able to provide individual support via email. We also believe that help is
much more valuable if it's shared publicly, so that more people can benefit from
it., Type, Platforms, ------------------------, ------------------------------------------------------, ? Bug Reports, [GitHub Issue Tracker], ? Feature Requests, [GitHub Issue Tracker], ?‍? Usage Questions, [Stack Overflow] · [Gitter Chat] · [Reddit User Group], ? General Discussion, [Gitter Chat] · [Reddit User Group], [github issue tracker]: https://github.com/explosion/spaCy/issues
[stack overflow]: https://stackoverflow.com/questions/tagged/spacy
[gitter chat]: https://gitter.im/explosion/spaCy
[reddit user group]: https://www.reddit.com/r/spacynlp

Features

  • Non-destructive tokenization
  • Named entity recognition
  • Support for 50+ languages
  • pretrained statistical models and word vectors
  • State-of-the-art speed
  • Easy deep learning integration
  • Part-of-speech tagging
  • Labelled dependency parsing
  • Syntax-driven sentence segmentation
  • Built in visualizers for syntax and NER
  • Convenient string-to-hash mapping
  • Export to numpy data arrays
  • Efficient binary serialization
  • Easy model packaging and deployment
  • Robust, rigorously evaluated accuracy

? For more details, see the
facts, figures and benchmarks.

Install spaCy

For detailed installation instructions, see the
documentation.

  • Operating system: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual
    Studio)
  • Python version: Python 2.7, 3.5+ (only 64 bit)
  • Package managers: pip · conda (via conda-forge)

pip

Using pip, spaCy releases are available as source packages and binary wheels (as
of v2.0.13).

pip install spacy

To install additional data tables for lemmatization in spaCy v2.2+ you can
run pip install spacy[lookups] or install
spacy-lookups-data
separately. The lookups package is needed to create blank models with
lemmatization data, and to lemmatize in languages that don't yet come with
pretrained models and aren't powered by third-party libraries.

When using pip it is generally recommended to install packages in a virtual
environment to avoid modifying system state:

python -m venv .env
source .env/bin/activate
pip install spacy

conda

Thanks to our great community, we've finally re-added conda support. You can now
install spaCy via conda-forge:

conda install -c conda-forge spacy

For the feedstock including the build recipe and configuration, check out
this repository. Improvements
and pull requests to the recipe and setup are always appreciated.

Updating spaCy

Some updates to spaCy may require downloading new statistical models. If you're
running spaCy v2.0 or higher, you can use the validate command to check if
your installed models are compatible and if not, print details on how to update
them:

pip install -U spacy
python -m spacy validate

If you've trained your own models, keep in mind that your training and runtime
inputs must match. After updating spaCy, we recommend retraining your models
with the new version.

? For details on upgrading from spaCy 1.x to spaCy 2.x, see the
migration guide.

Download models

As of v1.7.0, models for spaCy can be installed as Python packages. This
means that they're a component of your application, just like any other module.
Models can be installed using spaCy's download command, or manually by
pointing pip to a path or URL., Documentation, ----------------------, -------------------------------------------------------------, [Available Models], Detailed model descriptions, accuracy figures and benchmarks., [Models Documentation], Detailed usage instructions., [available models]: https://spacy.io/models
[models documentation]: https://spacy.io/docs/usage/models

# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm

# pip install .tar.gz archive from path or URL
pip install /Users/you/en_core_web_sm-2.2.0.tar.gz
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz

Loading and using models

To load a model, use spacy.load() with the model name, a shortcut link or a
path to the model data directory.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")

You can also import a model directly via its full name and then call its
load() method with no arguments.

import spacy
import en_core_web_sm

nlp = en_core_web_sm.load()
doc = nlp("This is a sentence.")

? For more info and examples, check out the
models documentation.

Compile from source

The other way to install spaCy is to clone its
GitHub repository and build it from
source. That is the common way if you want to make changes to the code base.
You'll need to make sure that you have a development environment consisting of a
Python distribution including header files, a compiler,
pip,
virtualenv and
git installed. The compiler part is the trickiest. How to
do that depends on your system. See notes on Ubuntu, OS X and Windows for
details.

# make sure you are using the latest pip
python -m pip install -U pip
git clone https://github.com/explosion/spaCy
cd spaCy

python -m venv .env
source .env/bin/activate
export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py build_ext --inplace

Compared to regular install via pip, requirements.txt
additionally installs developer dependencies such as Cython. For more details
and instructions, see the documentation on
compiling spaCy from source and the
quickstart widget to get the right
commands for your platform and Python version.

Ubuntu

Install system-level dependencies via apt-get:

sudo apt-get install build-essential python-dev git

macOS / OS X

Install a recent version of XCode,
including the so-called "Command Line Tools". macOS and OS X ship with Python
and git preinstalled.

Windows

Install a version of the
Visual C++ Build Tools
or Visual Studio Express that
matches the version that was used to compile your Python interpreter. For
official distributions these are VS 2008 (Python 2.7), VS 2010 (Python 3.4) and
VS 2015 (Python 3.5).

Run tests

spaCy comes with an extensive test suite. In order to run the
tests, you'll usually want to clone the repository and build spaCy from source.
This will also install the required development dependencies and test utilities
defined in the requirements.txt.

Alternatively, you can find out where spaCy is installed and run pytest on
that directory. Don't forget to also install the test utilities via spaCy's
requirements.txt:

python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
pip install -r path/to/requirements.txt
python -m pytest <spacy-directory>

See the documentation for more details and
examples.