This project demonstrates the use of various Google Cloud services to detect hate speech in images and delete them from Google Cloud Storage (GCS) in real-time. The project uses Google Cloud Vision to extract text from images stored in GCS, and uses a custom algorithm to detect hate speech in the extracted text. If the image contains hate speech, it is deleted from GCS. Otherwise, a message is published to a new topic for further processing.
Before running this project, you need to have the following:
- A Google Cloud Platform (GCP) account with billing enabled
- A GCS bucket with images that you want to process
- A Pub/Sub subscription to receive messages when new images are uploaded to the GCS bucket
To run this project, follow these steps:
- Clone this repository to your local machine
- Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path of your GCP service account key file - Update the following variables in the
main.py
file with your own values:subscription_path
: The Pub/Sub subscription pathbucket_name
: The GCS bucket namehate_speech_keywords
: A list of keywords to use for detecting hate speechnew_topic_name
: The name of the new Pub/Sub topic to publish messages to
- Run
python main.py
to start the subscriber
When a new image is uploaded to the GCS bucket, the subscriber will receive a message and the following will happen:
- The image will be downloaded from GCS to a temporary file
- Google Cloud Vision will be used to extract text from the image
- The custom algorithm will check the extracted text for the presence of hate speech keywords
- If the image contains hate speech, it will be deleted from GCS
- Otherwise, a message will be published to the new topic for further processing