The Small-LexNormViHSD dataset is used for lexical normalization on Vietnamese social media text.
This dataset contains 2,181 annotated comments from the ViHSD dataset, which is used for hate speech detection on social network sites. Label: input (non-standard sentence), output(standard sentence)
To understand more about the dataset, please read this paper: Automatic Textual Normalization for Hate Speech Detection
Please cite the following paper if you use this dataset: