Bert 中文文本分类

文本分类资料综述总结：https://github.com/xiaoqian19940510/text-classification-surveys

参考code:

运行环境

python3.6
tensorflow1.15

任务判断文本相似性

数据说明

数据来自己：https://github.com/wslc1314/atec_nlp_sim_update/tree/master/data/atec

数据的标签为1表示两句话相匹配，标签为0表示两句话不匹配

atec_nlp_sim_train_0.6.csv和atec_nlp_sim_test_0.4.csv格式相同
各列分别为"index query1 query2 label"，"\t"分隔。来自网络上蚂蚁金服的公开数据集，是判断文本相似性的数据
程序运行中只选择了部分数据(train3000,val500,test100)

训练与预测

python run_classifier_mian.py.py --task_name=MAYI --do_train=true --do_eval=true --data_dir=./ --vocab_file=./chinese_L-12_H-768_A-12/vocab.txt --bert_config_file=./chinese_L-12_H-768_A-12/bert_
config.json --init_checkpoint=./chinese_L-12_H-768_A-12/bert_model.ckpt --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-5 --num_train_epochs=3.0 --output_dir=./output/
 
python run_classifier_mian.py.py --task_name=MAYI --do_train=false --do_eval=false --do_predict=true --data_dir=./ --vocab_file=./chinese_L-12_H-768_A-12/vocab.txt --bert_config_file=./chinese_L-12_H-768_A-12/bert_config.json --init_checkpoint=./chinese_L-12_H-768_A-12/bert_model.ckpt --max_seq_length=128 --output_dir=./output/

参考资料

Bert文本分类(基于keras-bert实现)：https://blog.csdn.net/asialee_bird/article/details/102747435

Bert进行二分类：

基于keras-bert实现训练，保存，加载，预测单个文本:https://blog.csdn.net/hejp_123/article/details/105432539

Bert_Classification：https://github.com/morenjiujiu/Chinese_Bert_Classification

数据处理参考：

zyxin810/bert-classification-zh

Bert 中文文本分类

运行环境

任务 判断文本相似性

数据说明

训练 与 预测

参考资料

任务判断文本相似性

训练与预测