ZeroE04/IDCNN-pytorch

Pytorch implementation of IDCNN/ pytorch实现的IDCNN

Python

IDCNN-pytorch(中文)

用pytorch实现的IDCNN，实测在精度和BiLSTM一致的情况下，速度大幅提升

前言

本工作初衷是提供一个极其精简的且可以直接跑起来代码，并对文本序列进行标注

预处理：先把train.txt，valid.txt放到data/raw里面，修改config.py里面其路径，然后python preprocess.py，随后会在data/processed/里面生成准备好的训练数据;
训练：python train.py，随后会按照config.py里面设置的路径进行model保存；
推理：python predict.py 今天北京的天气怎么样

常见问题

torch==1.2.0
原论文的IDCNN中间的CNN是并行的，我这里改成了串行四个block，详细看ner.py
如果你要用预训练的embedding，可以在config.py内指定EMBEDDING_FILE路径，并把ner.py内的get_embedding函数中的nil参数改成False (download:https://pan.baidu.com/s/1Rn7LtTH0n7SHyHPfjRHbkg)

训练自己的多分类网络

直接把train.txt和valid.txt换成你自己的就行了，仅仅是对字符级别进行分类(即序列标注)，所以不限制BIEOS还是BIO等。

IDCNN-pytorch(English)

Pytorch implementation of IDCNN

preface

The original intention of this work is to provide a extremely compact code of IDCNN-pytorch

process: put your train.txt/valid.txt in data/raw, and run "python preprocess.py", than the prepared training_data will generated in data/processed/
train: python train.py, model will save in /model, you can change the save_dir and model_name in config.py
inference: python predict.py 今天北京的天气怎么样

notes

torch==1.2.0
In the original of IDCNN, cnn blocks is parallel, and I changed it to four serial blocks here. See ner.py in detail.
If you want to use pre-trained embedding, specify the EMBEDDING_FILE path in config.py and set the nil parameter in the get_embedding function in ner.py to False (download:https://pan.baidu.com/s/1Rn7LtTH0n7SHyHPfjRHbkg)

train your data

put your-self train.txt/valid.txt in /data/raw, there are no restrictions on BIEOS or BIO, etc.