Welcome to the backend repository of ThreadMind, a pioneering service specialized in the sophisticated analysis of user-generated content on YouTube and Reddit platforms. This repository serves as the backbone for the application, employing state-of-the-art machine learning algorithms for sentiment analysis, emotion recognition, and cyberbullying classification, along with natural language summarization and keyword extraction capabilities.
- Objective: To accrue and amalgamate a spectrum of contextual metadata, including channel/subreddit attributes and post/video descriptions.
- Implementation: Utilizes OAuth 2.0 protocols for secure API calls to YouTube and Reddit, ensuring data integrity and security.
- Objective: To offer actionable insights by performing sentiment analysis, emotion recognition, and cyberbullying classification on user comments.
- Implementation: Employs fine-tuned machine learning models on socially-sourced datasets, including but not limited to Twitter and Reddit. For an in-depth review of the models and methodologies, refer to the associated Jupyter notebooks.
- Objective: To distill extensive comment threads into concise summaries and relevant keywords.
- Implementation: Leverages the capabilities of OpenAI's GPT-3.5TURBO model, utilizing advanced NLP techniques like TF-IDF for keyword extraction.
- Communication: REST API
- Deployment Platform: Heroku
- Data Sources: YouTube API, Reddit API
- Machine Learning Models: OpenAI GPT-3.5TURBO, Fine-tuned RoBERTa, and XLNet hosted on Google Cloud Run
Experience the live application here.
- Rate Limiting: Implemented to manage the API request frequency, thereby ensuring system stability.
- Session Management: Unique session IDs are generated to optimize resource allocation and to circumvent redundancy.
Interested contributors are invited to connect via LinkedIn. Explore the front-end code here.
The project is licensed under the MIT License.