-The MDRDC(Malevolent Dialogue Response Detection and Classification) dataset: The ./dataset/data_example.tsv file has an example of the dataset now. The whole dataset includes 6,000 dialogues.
-The MDMD (Multi-Label Dialogue Malevolence Detection) dataset: Training set is the same as MDRDC, Dev and test sets are multi-label.
We also provide guidance of baselines for single-label malevolent dialogue detection and classification:
python 3.6 pytorch 1.0 Tensorflow >= 1.4.0
We modify the charCNN baseline from Yuanping Chen's open source code. Link: charCNN
We modify the RNN,CNN,RCNN baselines from Wenxing Hu's open source code. Link: RNN,CNN,RCNN
We use the GCN baseline from Liang Yao's open source code. Link: textGCN,textGCN.pytorch
We modify the BERT-base text classification model from Yingxin Song's open source code. Link: BERT-base
We use the softmax score and TCP score as confidence. Softmax score come from the BERT-base model directly. Please use the softmax score of BERT-base classification model. TCP score is trained by the BERT-Confidnet module. Please refer to the code of our another paper, link: BERT-conf
We provide part of the checkpoint files for BERT-base classification model. Due to the file size, we provide the files for Table 7 and Table 8. You can load the files to get the results for the test dataset.
MDRDC without context (Table 7):1st-level, 2nd-level and 3rd-level
MDRDC with context (Table 8):1st-level, 2nd-level and 3rd-level
bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese.
bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese.
@article{zhang2021taxonomy, title={A taxonomy, data set, and benchmark for detecting and classifying malevolent dialogue responses}, author={Zhang, Yangjun and Ren, Pengjie and de Rijke, Maarten}, journal={Journal of the Association for Information Science and Technology}, year={2021}, publisher={Wiley Online Library} }
@article{zhang2022improving, title={Improving Multi-label Malevolence Detection in Dialogues through Multi-faceted Label Correlation Enhancement}, author={Zhang, Yangjun and Ren, Pengjie and Deng, Wentao and Chen, Zhumin and de Rijke, Maarten}, journal={ACL}, year={2022} }