/ai-tamil-hate-speech-project-for-videos

An AI project aims to detect hate speech in Tamil language with the colobration of NYUCIC and Omdena

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

AI Tamil Hate Speech Detector

OMDENA NYU

Contents

  1. Project Introduction

  2. Project Setup and Documentation

  3. Project Details

  4. Test API with Python script

  5. License

Introduction

This project is the result of a collaboration between DreamSpace Academy, NYU CIC, and Omdena, and was funded by NYU CIC. The goal of the project is to detect hate speech on social media platforms that's in either Tamil, English or Tanglish (English transliterated into Tamil). A global team of 50 AI changemakers took on the task to detect hate speech in Tamil language.The partner for this challenge is social enterprise DreamSpace Academy (DSA). The Challenge is supported by the NYU Center on International Cooperation and the Netherlands Ministry of Foreign Affairs.

The focus is on the following hate-speech related categories:

  • Community-based hate speech

  • Religion-based hate speech

  • Gender-based hate speech

  • Political hate speech

Solution

  • An AI model written in Python: Built using Fastapi and Streamlit making the complete code base in Python.

Project Setup and Documentation

  1. Clone the Repo.

  2. Run the backend service. (Make sure Docker is running.)

    • Go to the backend folder
    • Run the Docker Compose command
    $ cd backend
    backend:~$ sudo docker-compose up -d
  3. Run the frontend service.

    • Go to the frontend folder
    • Run the app with the streamlit run command
    $ cd frontend
    frontend:~$ streamlit run NLPfile.py
  4. Access to Fastapi Documentation:

Project Details

Screenshot

Directory Details

  • Front End: streamlit code is in the frontend folder. Along with the Dockerfile and requirements.txt

  • Back End: Fastapi code is in the backend folder.

    • The project has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile.

    • Directory tree as below:

      - classification
          > app
              > api
                  > bert_model_artifacts
                      - model.bin
                      - network.py
      
    • Each folder model will need the following files:

      • Model bin file is the saved model after training.
      • network.py for customised model, define class here.
    • config.json: This file contains the details of the models in the backend and the dataset they are trained on.

Test API with Python Script

  • Run the following script with your desired text input as the data variable:
$ cd backend
backend:~$ python backend\test_api.py

License

This project is licensed under the Apache License 2.0. You may not use any trademarks associated with the software without permission. The full text of the license can be found in the LICENSE file.