/hate-ita

Hate Speech classification in Italian using XLM (fine-tuning). Published at the WOAH workshop (NAACL2022).

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

HATE-ITA: Hate Speech Detection in Italian Social Media Text

Debora NozzaFederico BianchiGiuseppe Attanasio

Model description --------

HATE-ITA is a binary hate speech classification model for Italian social media text.

image

See the paper for additional details:

Debora Nozza, Federico Bianchi, and Giuseppe Attanasio. 2022. HATE-ITA: New Baselines for Hate Speech Detection in Italian. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 252–260, Seattle, Washington (Hybrid). Association for Computational Linguistics. Link

License

Code comes from HuggingFace and thus our License is an MIT license.

For models restrictions may apply on the data (which are derived from existing datasets) or Twitter (main data source). We refer users to the original licenses accompanying each dataset and Twitter regulations.

Installing

Important: If you want to use CUDA you need to install the correct version of the CUDA systems that matches your distribution, see PyTorch.

Features

from hate_ita.classifier import HateSpeechClassifier
hc = HateSpeechClassifier()

hc.predict(["ti odio", "come si fa a rompere la lavatrice porca puttana"])

>> ["hate", "not-hate"]

Models

We release three models (see the paper for reference).

from hate_ita.classifier import HateSpeechClassifier
hc = HateSpeechClassifier("twitter")

hc = HateSpeechClassifier("base")

hc = HateSpeechClassifier("large")

Reference

If you use this tool please cite the following paper:

@inproceedings{nozza-etal-2022-hate,
    title = "{HATE}-{ITA}: Hate Speech Detection in {I}talian Social Media Text",
    author = "Nozza, Debora  and
      Bianchi, Federico  and
      Attanasio, Giuseppe",
    booktitle = "Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)",
    month = jul,
    year = "2022",
    address = "Seattle, Washington (Hybrid)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.woah-1.24",
    doi = "10.18653/v1/2022.woah-1.24",
    pages = "252--260"
}

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

License

GNU GPLv3