NeuralClassifier

一个开源的神经分层多标签文本分类工具包。「An Open-source Neural Hierarchical Multi-label Text Classification Toolkit」

Github星跟踪图

NeuralClassifier Logo

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit

Introduction

NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. A salient feature is that NeuralClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet and Transformer encoder, etc. It also supports other text classification scenarios, including binary-class and multi-class classification. It is built on PyTorch. Experiments show that models built in our toolkit achieve comparable performance with reported results in the literature.

Support tasks

  • Binary-class text classifcation
  • Multi-class text classification
  • Multi-label text classification
  • Hiearchical (multi-label) text classification (HMC)

Support text encoders

Requirement

  • Python 3
  • PyTorch 0.4+
  • Numpy 1.14.3+

System Architecture

NeuralClassifier Architecture

Usage

Training

python train.py conf/train.json

Detail configurations and explanations see Configuration.

The training info will be outputted in standard output and log.logger_file.

Evaluation

python eval.py conf/train.json
  • if eval.is_flat = false, hierarchical evaluation will be outputted.
  • eval.model_dir is the model to evaluate.
  • data.test_json_files is the input text file to evaluate.

The evaluation info will be outputed in eval.dir.

Prediction

python predict.py conf/train.json data/predict.json 
  • predict.json should be of json format, while each instance has a dummy label like "其他" or any other label in label map.
  • eval.model_dir is the model to predict.
  • eval.top_k is the number of labels to output.
  • eval.threshold is the probability threshold.

The predict info will be outputed in predict.txt.

Input Data Format

JSON example:

{
    "doc_label": ["Computer--MachineLearning--DeepLearning", "Neuro--ComputationalNeuro"],
    "doc_token": ["I", "love", "deep", "learning"],
    "doc_keyword": ["deep learning"],
    "doc_topic": ["AI", "Machine learning"]
}

"doc_keyword" and "doc_topic" are optional.

Performance

0. Dataset

1. Compare with state-of-the-art

2. Different text encoders

3. Hierarchical vs Flat

Acknowledgement

Some public codes are referenced by our toolkit:

Update

  • 2019-04-29, init version

概览

名称与所有者Tencent/NeuralNLP-NeuralClassifier
主编程语言Python
编程语言Python (语言数: 1)
平台Linux, Mac, Windows
许可证Other
发布数0
创建于2019-07-04 08:16:11
推送于2023-09-06 06:35:08
最后一次提交2023-09-06 14:35:08
星数1.8k
关注者数66
派生数400
提交数36
已启用问题?
问题数105
打开的问题数4
拉请求数12
打开的拉请求数0
关闭的拉请求数6
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?
去到顶部