OpenNRE-PyTorch

在 PyTorch 中实现神经关系提取。「Neural Relation Extraction implemented in PyTorch」

  • 所有者: ShulinCao/OpenNRE-PyTorch
  • 平台: Linux, Mac, Windows
  • 许可证: MIT License
  • 分类:
  • 主题:
  • 喜欢:
    2
      比较:

Github星跟踪图

OpenNRE-PyTorch

在 PyTorch 中实现的用于神经关系提取的开源框架。

Shulin Cao, Tianyu Gao, Xu Han, Lumin Tang, Yankai Lin, Zhiyuan Liu 所贡献。

总览

它是基于 PyTorch 的框架,用于轻松建立关系提取模型。 我们将关系提取流水线分为四个部分,即嵌入、编码器、选择器和分类器。 对于每个部分,我们已经实现了几种方法。

  • 嵌入
    • 词嵌入
    • 位置嵌入
    • 串联方法
  • 编码器
    • PCNN
    • CNN
  • 选择器
    • Attention
    • 最大值
    • 平均值
  • 分类器
    • Softmax 损失函数
    • 输出量

所有这些方法都可以自由组合。

我们还提供快速的培训和测试代码。 您可以使用Python参数来更改超参数或指定模型架构。 包装中还包含一种绘图方法。

该项目已获得MIT许可。

要求

  • Python (>=2.7)
  • PyTorch (==0.3.1)
  • CUDA (>=8.0)
  • Matplotlib (>=2.0.0)
  • scikit-learn (>=0.18)

安装

  1. Install PyTorch
  2. Clone the OpenNRE repository:
git clone https://github.com/ShulinCao/OpenNRE-PyTorch
  1. Download NYT dataset from Google Drive
  2. Extract dataset to ./raw_data
unzip raw_data.zip

数据集

NYT10 数据集

NYT10 是一个受远程监督的数据集,最初由论文“Sebastian Riedel,Limin Yao 和 Andrew McCallum 建模关系及其提及而没有带标签的文本”发布。 这是原始数据的下载链接。 您可以从 Google Drive 下载 NYT10 数据集。 数据细节如下:

培训数据和测试数据

包含句子及其对应实体对和关系的培训数据文件和测试数据文件应采用以下格式

[
    {
        'sentence': 'Bill Gates is the founder of Microsoft .',
        'head': {'word': 'Bill Gates', 'id': 'm.03_3d', ...(other information)},
        'tail': {'word': 'Microsoft', 'id': 'm.07dfk', ...(other information)},
        'relation': 'founder'
    },
    ...
]

重要说明:在句子部分,单词和标点符号之间应使用空格隔开。

词嵌入数据

词嵌入数据用于初始化网络中的词嵌入,并且应采用以下格式

[
    {'word': 'the', 'vec': [0.418, 0.24968, ...]},
    {'word': ',', 'vec': [0.013441, 0.23682, ...]},
    ...
]

关系 ID 映射数据

该文件指示用于建立关系的相应ID,以确保在每个培训和测试期间,相同的ID表示相同的关系。 其格式如下

{
    'NA': 0,
    'relation_1': 1,
    'relation_2': 2,
    ...
}

重要说明:确保 NA 的 ID 始终为 0。

快速开始

处理数据

python gen_data.py

处理后的数据将存储在 ./data 中。

训练模型

python train.py --model_name pcnn_att

arg model_name 指定模型架构,而 pcnn_att 是我们模型之一的名称。所有可用的模型均位于 ./models 中。关于其他参数,请参考 ./train.py。一旦开始训练,所有检查点都将存储在 ./checkpoint 中。

测试模型

python test.py --model_name pcnn_att

用法与训练相同。完成测试后,最佳检查点的对应 pr 曲线数据将存储在 ./test_result 中。

绘图

python draw_plot.py PCNN_ATT

该图将另存为 ./test_result/pr_curve.png。您可以在参数中指定多个模型,例如 python draw_plot.py PCNN_ATT PCNN_ONE PCNN_AVE,只要这些模型的结果位于 ./test_result 中即可。

建立自己的模型

您不仅可以训练和测试我们软件包中的现有模型,还可以构建自己的模型或向四个基本模块中添加方法。添加新模型时,您可以在 ./models 中创建一个与模型同名的 python 文件,并按以下方式实现它:

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from networks.embedding import *
from networks.encoder import *
from networks.selector import *
from networks.classifier import *
from .Model import Model
class PCNN_ATT(Model):
  def __init__(self, config):
    super(PCNN_ATT, self).__init__(config)
    self.encoder = PCNN(config)
    self.selector = Attention(config, config.hidden_size * 3)

然后,您可以训练,测试和绘图!


(The first version translated by vz on 2020.07.19)

主要指标

概览
名称与所有者ShulinCao/OpenNRE-PyTorch
主编程语言Python
编程语言Python (语言数: 1)
平台Linux, Mac, Windows
许可证MIT License
所有者活动
创建于2018-08-06 05:26:48
推送于2018-11-15 02:27:38
最后一次提交2018-11-15 10:27:37
发布数0
用户参与
星数219
关注者数6
派生数45
提交数10
已启用问题?
问题数27
打开的问题数14
拉请求数0
打开的拉请求数0
关闭的拉请求数0
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?

OpenNRE-PyTorch

An open-source framework for neural relation extraction implemented in PyTorch.

Contributed by Shulin Cao, Tianyu Gao, Xu Han, Lumin Tang, Yankai Lin, Zhiyuan Liu

Overview

It is a PyTorch-based framwork for easily building relation extraction models. We divide the pipeline of relation extraction into four parts, which are embedding, encoder, selector and classifier. For each part we have implemented several methods.

  • Embedding
    • Word embedding
    • Position embedding
    • Concatenation method
  • Encoder
    • PCNN
    • CNN
  • Selector
    • Attention
    • Maximum
    • Average
  • Classifier
    • Softmax loss function
    • Output

All those methods could be combined freely.

We also provide fast training and testing codes. You could change hyper-parameters or appoint model architectures by using Python arguments. A plotting method is also in the package.

This project is under MIT license.

Requirements

  • Python (>=2.7)
  • PyTorch (==0.3.1)
  • CUDA (>=8.0)
  • Matplotlib (>=2.0.0)
  • scikit-learn (>=0.18)

Installation

  1. Install PyTorch
  2. Clone the OpenNRE repository:
git clone https://github.com/ShulinCao/OpenNRE-PyTorch
  1. Download NYT dataset from Google Drive
  2. Extract dataset to ./raw_data
unzip raw_data.zip

Dataset

NYT10 Dataset

NYT10 is a distantly supervised dataset originally released by the paper "Sebastian Riedel, Limin Yao, and Andrew McCallum. Modeling relations and their mentions without labeled text.". Here is the download link for the original data.
You can download the NYT10 dataset from Google Drive. And the data details are as follows.

Training Data & Testing Data

Training data file and testing data file, containing sentences and their corresponding entity pairs and relations, should be in the following format

[
    {
        'sentence': 'Bill Gates is the founder of Microsoft .',
        'head': {'word': 'Bill Gates', 'id': 'm.03_3d', ...(other information)},
        'tail': {'word': 'Microsoft', 'id': 'm.07dfk', ...(other information)},
        'relation': 'founder'
    },
    ...
]

IMPORTANT: In the sentence part, words and punctuations should be separated by blank spaces.

Word Embedding Data

Word embedding data is used to initialize word embedding in the networks, and should be in the following format

[
    {'word': 'the', 'vec': [0.418, 0.24968, ...]},
    {'word': ',', 'vec': [0.013441, 0.23682, ...]},
    ...
]

Relation-ID Mapping Data

This file indicates corresponding IDs for relations to make sure during each training and testing period, the same ID means the same relation. Its format is as follows

{
    'NA': 0,
    'relation_1': 1,
    'relation_2': 2,
    ...
}

IMPORTANT: Make sure the ID of NA is always 0.

Quick Start

Process Data

python gen_data.py

The processed data will be stored in ./data

Train Model

python train.py --model_name pcnn_att

The arg model_name appoints model architecture, and pcnn_att is the name of one of our models. All available models are in ./models. About other arguments please refer to ./train.py. Once you start training, all checkpoints are stored in ./checkpoint.

Test Model

python test.py --model_name pcnn_att

Same usage as training. When finishing testing, the best checkpoint's corresponding pr-curve data will be stored in ./test_result.

Plot

python draw_plot.py PCNN_ATT

The plot will be saved as ./test_result/pr_curve.png. You could appoint several models in the arguments, like python draw_plot.py PCNN_ATT PCNN_ONE PCNN_AVE, as long as there are these models' results in ./test_result.

Build Your Own Model

Not only could you train and test existing models in our package, you could also build your own model or add methods to the four basic modules. When adding a new model, you could create a python file in ./models having the same name as the model and implement it like following:

import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
from networks.embedding import *
from networks.encoder import *
from networks.selector import *
from networks.classifier import *
from .Model import Model
class PCNN_ATT(Model):
  def __init__(self, config):
    super(PCNN_ATT, self).__init__(config)
    self.encoder = PCNN(config)
    self.selector = Attention(config, config.hidden_size * 3)

Then you can train, test and plot!