AugCOLD dataset

Enhancing Offensive Language Detection with Data Augmentation and Knowledge Distillation

AugCOLD (Augmented Chinese Offensive Language Dataset) is a large-scale unsupervised dataset, containing 1 million samples gathered by data crawling and model generation.

Citing

Please kindly cite our paper if this paper and the dataset are helpful.

@article{deng2023Augcold,
author = {Jiawen Deng  and Zhuang Chen  and Hao Sun  and Zhexin Zhang  and Jincenzi Wu  and Satoshi Nakagawa  and Fuji Ren  and Minlie Huang },
title = {Enhancing Offensive Language Detection with Data Augmentation and Knowledge Distillation},
journal = {Research},
volume = {6},
number = {},
pages = {0189},
year = {2023},
doi = {10.34133/research.0189},
URL = {https://spj.science.org/doi/abs/10.34133/research.0189},
eprint = {https://spj.science.org/doi/pdf/10.34133/research.0189}
}

DENG-JW/AugCOLD

AugCOLD dataset

Citing