Distant-Supervised-Chinese-Relation-Extraction

基于远监督的中文关系抽取

  • Owner: xiaofei05/Distant-Supervised-Chinese-Relation-Extraction
  • Platform:
  • License:: MIT License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

# Distant-Supervised-Chinese-Relation-Extraction

基于远监督的中文关系抽取

数据集构建

  • 中文通用知识库CN-DBpedia
  • 远监督假设

处理流程可在 kg_data/README.md 中查看。点击此处(谷歌云盘)下载处理后的数据子集。

模型选择

使用 thunlp/OpenNRE 的模型, 具体信息参考其说明。

源链接: https://github.com/thunlp/OpenNRE

运行代码

数据集文件目录代码默认为 data/chinese,在命令中运行:

python train_demo.py chinese pcnn att

模型结果

部分关系的结果如下:

类别, 精准度, 召回率, F1分数
:-:, :-:, :-:, :-:
全部, 0.95428, 0.95036, 0.95232
/人物/其它/民族, 0.98374, 0.979, 0.98137
NA, 0.96853, 0.97824, 0.97336
/人物/地点/国籍, 0.84075, 0.92673, 0.88164
/组织/地点/位于, 0.85157, 0.83652, 0.84398
/人物/其它/职业, 0.86121, 0.8037, 0.83147
/人物/组织/毕业于, 0.84137, 0.78092, 0.81002
/组织/人物/校长, 0.94118, 0.59259, 0.72727
/人物/地点/出生地, 0.81049, 0.49028, 0.61097
/人物/人物/家庭成员, 0.65385, 0.37778, 0.47887
/人物/组织/属于, 0.99999, 0.11364, 0.20408
/地点/地点/包含, 0.99999, 0.0625, 0.11765
/组织/人物/创始人, 0.99999, 0.05882, 0.11111

某些关系的召回率很低,分析发现原因可能是数据集中该关系的样本非常少。

模型改进

未完待续

Main metrics

Overview
Name With Ownerxiaofei05/Distant-Supervised-Chinese-Relation-Extraction
Primary LanguagePython
Program languagePython (Language Count: 2)
Platform
License:MIT License
所有者活动
Created At2019-03-14 09:18:05
Pushed At2021-05-13 12:40:55
Last Commit At2021-05-13 20:38:56
Release Count0
用户参与
Stargazers Count383
Watchers Count5
Fork Count61
Commits Count25
Has Issues Enabled
Issues Count13
Issue Open Count0
Pull Requests Count0
Pull Requests Open Count0
Pull Requests Close Count0
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private