NeuralClassifier

一个开源的神经分层多标签文本分类工具包。「An Open-source Neural Hierarchical Multi-label Text Classification Toolkit」

Github星跟蹤圖

NeuralClassifier Logo

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit

Introduction

NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. A salient feature is that NeuralClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet and Transformer encoder, etc. It also supports other text classification scenarios, including binary-class and multi-class classification. It is built on PyTorch. Experiments show that models built in our toolkit achieve comparable performance with reported results in the literature.

Support tasks

  • Binary-class text classifcation
  • Multi-class text classification
  • Multi-label text classification
  • Hiearchical (multi-label) text classification (HMC)

Support text encoders

Requirement

  • Python 3
  • PyTorch 0.4+
  • Numpy 1.14.3+

System Architecture

NeuralClassifier Architecture

Usage

Training

python train.py conf/train.json

Detail configurations and explanations see Configuration.

The training info will be outputted in standard output and log.logger_file.

Evaluation

python eval.py conf/train.json
  • if eval.is_flat = false, hierarchical evaluation will be outputted.
  • eval.model_dir is the model to evaluate.
  • data.test_json_files is the input text file to evaluate.

The evaluation info will be outputed in eval.dir.

Prediction

python predict.py conf/train.json data/predict.json 
  • predict.json should be of json format, while each instance has a dummy label like "其他" or any other label in label map.
  • eval.model_dir is the model to predict.
  • eval.top_k is the number of labels to output.
  • eval.threshold is the probability threshold.

The predict info will be outputed in predict.txt.

Input Data Format

JSON example:

{
    "doc_label": ["Computer--MachineLearning--DeepLearning", "Neuro--ComputationalNeuro"],
    "doc_token": ["I", "love", "deep", "learning"],
    "doc_keyword": ["deep learning"],
    "doc_topic": ["AI", "Machine learning"]
}

"doc_keyword" and "doc_topic" are optional.

Performance

0. Dataset

1. Compare with state-of-the-art

2. Different text encoders

3. Hierarchical vs Flat

Acknowledgement

Some public codes are referenced by our toolkit:

Update

  • 2019-04-29, init version

概覽

名稱與所有者Tencent/NeuralNLP-NeuralClassifier
主編程語言Python
編程語言Python (語言數: 1)
平台Linux, Mac, Windows
許可證Other
發布數0
創建於2019-07-04 08:16:11
推送於2023-09-06 06:35:08
最后一次提交2023-09-06 14:35:08
星數1.8k
關注者數66
派生數400
提交數36
已啟用問題?
問題數105
打開的問題數4
拉請求數12
打開的拉請求數0
關閉的拉請求數6
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?
去到頂部