/model-online-misogyny

our modeling of online misogyny

Primary LanguageJupyter NotebookOtherNOASSERTION

Opt Out Tools Machine Learning R&D Repository

Welcome to the Opt Out Tools (OOT) Machine Learning R&D repository. This repository contains the research and the production code allowing us to build a machine learning model for the automatic detection of online misogyny on Twitter.

A first version of this model is currently in use in the Opt Out browser extension. The extension is currently itself in its alpha version and available for download in the Firefox add-ons library. A data statement of the dataset used for the first version of the model can be found on OOT's website.

Please read the CONTRIBUTING.md file in this repository to know how you can contribute to it.

Quick links:

Repository purpose

This repository has two purposes:

  • Researching online misogyny automatic detection, i.e. exploring hate speech datasets and experimenting with machine learning algorithms.
  • Building a machine learning model for the browser extension based on our research.

Repository structure

├── .circleci               <- Folder containing the CircleCI configuration file for this repository.
├── .github/ISSUE_TEMPLATE  <- Folder containing templates to create different types of issues for this
│                              repository.
├── data                    <- Folder for copying the OOT dataset and for documenting other datasets that  
│                              tackle the problem of misogyny/hate speech and their labeling process.
├── docs                    <- Folder containing the files necessary to produce documentation with
│                              Sphinx.
├── models                  <- Folder for saving trained and serialized models fit for production.
├── notebooks               <- Folder for saving Jupyter notebooks.
├── reports                 <- Folder for saving reports generated with Sphinx (HTML, PDF,
│                              LaTeX, etc.).
├── src                     <- Folder containing the source code to train models. The source code currently
│   │                          runs preprocessing pipelines, error analysis scripts and acceptance criteria
│   │                          scripts.
│   └── text                <- Folder containing the utility modules for text processing in the pipeline.
├── stages                  <- Folder containing the files necessary to run the machine learning pipeline.
├── tests                   <- Folder for saving tests for the machine learning pipeline to make sure that
│                              the source code works as expected.
├── .flake8                 <- Linter file necessary to format code to the OOT standards.
├── .pre-commit-config.yaml <- List of the scripts run at the pre-commit stage.
├── .pylintrc               <- Linter file necessary to format code to the OOT standards.
├── CONTRIBUTING.md         <- Instructions on how to contribute to this repository.
├── Dvcfile                 <- Default stage (i.e evaluation stage) for the machine learning pipeline.
├── LICENSE                 <- Folder containing the license for use of this repository.
├── README.md               <- General information about this repository.
├── mypy.ini                <- File necessary to allow types in Python.
├── opt_out_logo.png        <- Logo used in the README of this repository.
├── requirements.txt        <- Requirements file for reproducing the analysis environment.
└── setup.py                <- Configuration file for the source code.

Repository management

This repository is managed by the Opt Out Tools data team. If you have any question, please reach out to one of the following members of the team on Github:

  • Andrada: andra-pumnea
  • Verena: Ver2307

Repository status

We use CircleCI for CI/CD. You can always check if anything is broken in the repository in this section.

Current status: CircleCI

NOTE: We do not currently have an automated model deployment mechanism.

Code of conduct

Please note that this repository is part of the Opt Out Tools project which is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.


Project structure based on the cookiecutter data science project template. #cookiecutterdatascience