Harassment-Corpus

Publishing a Quality Context-aware Annotated Corpus andLexicon for Harassment Research.

Identifying profane or offensive words are a standard way of starting the investigation over cyberbullying incident. For this reason, initially we created a lexicon form the profane words and we divided our dictionary into the six context;1) Sexual 2) Appearance-related 3) Intellectual 4) Political 5) Racial 6) Combined. We utilized the first five categories of our lexiconas seed terms for collecting tweets from Twitter. Using at least one offensive word,we collected 10,000 tweets for each contextual type for a total of 50,000. Using offensive words in a given tweet does not assure that thetweet is harassing because individuals might utilize the offensivewords in a friendly manner or quotes. Therefore, we rely on human judged annotations for discriminating harassing tweets fromnot-harassing tweets. We acknowledge support from the National Science Foundation (NSF) award CNS 1513721: Context-Aware Harassment Detection on Social Media. Wiki page of this project: http://wiki.knoesis.org/index.php/Context-Aware_Harassment_Detection_on_Social_Media To getting our annotated tweets in five context, please contact the authors via these emails: Mohammadreza Rezvan: mohammadrezarezvan94@gmail.com Saeedeh Shekarpour: sshekarpour1@udayton.edu