This project combines natural language processing (NLP) and deep learning techniques to perform sentiment analysis and genre classification on song data. It processes both textual lyrics and audio signals to classify songs by genre and detect sentiment, making it a versatile solution for analyzing musical content.
- Project Overview
- Features
- Dataset
- Setup and Requirements
- Project Structure
- Model Training
- Evaluation and Metrics
- Results
- Usage
- Future Enhancements
The project builds and trains a deep learning model that:
- Analyzes song lyrics to classify sentiment.
- Processes audio features to categorize music by genre.
- Integrates the two modalities (text and audio) to improve the classification performance.
- Text Processing for Lyrics: Tokenizes and sequences lyrics for sentiment classification.
- Audio Feature Extraction: Uses Mel-frequency cepstral coefficients (MFCCs) from audio files for genre classification.
- Multi-Modal Neural Network: Combines audio and text data for improved classification.
- Visualization and Evaluation: Includes plotting functions for model accuracy, loss, and confusion matrix.
The project requires:
- Lyrics CSV File: Contains processed lyrics for sentiment analysis.
- Audio Data JSON File: Includes MFCCs and labels for genre classification.
- Audio Files: Used for feature extraction in genre classification (optional but recommended).
Ensure you have the following libraries installed:
!pip install transformers
Other Libraries Required:
numpy
pandas
tensorflow
matplotlib
librosa
scikit-learn
seaborn
Mount Google Drive to access the dataset:
from google.colab import drive
drive.mount('/content/drive')
Here's a breakdown of the primary files:
Create_Dataset.ipynb
: Creates the dataset by reading audio files, extracting MFCCs, and saving them as JSON.Training_the_model.ipynb
: Defines, compiles, and trains the multi-modal deep learning model.Sentiment_analysis.ipynb
: Analyzes song lyrics using tokenization and padding, followed by training for sentiment classification.final_script.ipynb
: Combines all steps, from loading data to training the multi-modal model and evaluating it.
- Tokenizer Setup: Tokenizes lyrics using Keras'
Tokenizer
with a vocabulary size of 5000. - Sequence Padding: Pads sequences to a fixed length of 1000 for input uniformity.
- Sentiment Classification Model: Uses bidirectional LSTM layers and dense layers with dropout for sentiment classification.
- MFCC Extraction: Extracts MFCCs for 30-second segments of audio data, setting up for genre classification.
- Normalization: Scales extracted MFCC features to ensure uniform input.
The multi-modal model uses:
- A pre-trained convolutional model (
model_crnn
) for audio features. - An embedding layer for lyrics.
- Bidirectional LSTM layers for lyrics, dense layers, and a dropout layer to prevent overfitting.
The model is trained using sparse categorical cross-entropy and the Adam optimizer for accurate multi-class classification.
- Accuracy and Loss Visualization: Plots training and validation accuracy/loss over epochs.
- Confusion Matrix: Provides a confusion matrix for detailed evaluation of model performance.
- Function
plot_confusion_matrix
: Plots a detailed confusion matrix using seaborn's heatmap for visual interpretation.
The results section will provide insights into:
- Model performance on both sentiment and genre classification.
- Potential areas for improving classification accuracy.
To test the model on an audio file, use the following function:
def test_music(audio_file):
# Run audio file through the model and output predictions
- Add more sentiment labels: Enhance sentiment analysis to include nuanced categories (e.g., joy, anger).
- Expand genre classification: Include more genres by training on a larger, diverse dataset.
- Real-time prediction: Build a real-time prediction pipeline for streaming audio data.