- Overview
- Features
- Architecture
- Dependencies
- Installation
- Usage
- Data Format
- Project Structure
- Memory Considerations
- Contributing
- License
The Emotion Classifier is a C-based application that leverages a neural network to classify text inputs into one of six predefined emotions: Sadness, Joy, Love, Anger, Fear, and Surprise. It processes a dataset from a CSV file, builds a vocabulary, trains a neural network model, and provides an interactive interface for emotion prediction.
- Neural Network Implementation: Custom neural network built from scratch in C.
- Vocabulary Building: Efficiently constructs a vocabulary from the dataset using hash tables.
- Model Persistence: Saves and loads trained models in binary format.
- Interactive Interface: Allows users to input text and receive emotion predictions in real-time.
- Memory Optimization: Limits vocabulary size to manage memory usage effectively.
- Error Handling: Robust error checking and memory management to prevent crashes.
The project is structured into several modules, each responsible for specific functionalities:
- main.c: Entry point of the application. Handles user interactions, model training, and prediction.
- network (subfolder): Contains
network.c
andnetwork.h
, which implement the neural network structure, including forward and backward propagation. - dataParsing (subfolder): Contains
dataParser.c
,dataParser.h
,vocabHash.h
, which handle dataset parsing and vocabulary management. - Makefile: Automates the build process, compiling source files and managing dependencies.
- C Compiler: GCC or any compatible C compiler.
- uthash: A header-only C library for hash tables. Included in the
include/
directory. - Make: For using the provided Makefile to build the project.
-
Clone the Repository:
git clone https://github.com/dylanneve1/EmotiNet.git cd EmotiNet
-
Ensure Dependencies Are Met:
-
GCC: Verify installation.
gcc --version
-
Make: Verify installation.
make --version
-
uthash: Already included in the
include/
directory.
-
-
Prepare the Dataset:
-
Place your
emotions.csv
file in the project root directory. -
Ensure the CSV is properly formatted with each line containing a text and a label separated by a comma. Example:
"I am so happy and joyful today!",1 "Feeling sad and low.",0
-
-
Run the Application:
./main
-
Select Training Option:
=== Emotion Classifier === 1. Load existing model 2. Train a new model Choose an option (1 or 2): 2
-
Training Process:
The application will parse the CSV, build the vocabulary (limited to the top 10,000 words to manage memory), convert text data to numerical input, train the neural network, and save the model to
model.bin
.Total valid data points: 416808 Vocabulary size: 10000 Training the neural network... Training completed. Model saved successfully to 'model.bin'.
-
Run the Application:
./main
-
Select Loading Option:
=== Emotion Classifier === 1. Load existing model 2. Train a new model Choose an option (1 or 2): 1
-
Model Loading:
The application will load the neural network model from
model.bin
.Model loaded successfully from 'model.bin'.
After training or loading a model, the application enters an interactive loop where you can input text and receive emotion predictions.
Enter text to classify (or type 'exit' to quit):
> I am feeling very happy and excited today!
Prediction:
Sadness: 0.123456
Joy: 0.812345
Love: 0.054321
Anger: 0.067890
Fear: 0.045678
Surprise: 0.032109
Predicted Emotion: Joy
Enter text to classify (or type 'exit' to quit):
> exit
Exiting the program.
The emotions.csv
should adhere to the following structure:
- Fields: Each line contains two fields: the text and the corresponding emotion label.
- Delimiter: Comma-separated.
- Quotation: Text containing commas should be enclosed in double quotes to prevent misparsing.
Example:
"I am so happy and joyful today!",1
"Feeling sad and low.",0
"Getting angry about the delays.",3
"Excited for the upcoming event!",5
"I am fearful of the unknown.",4
"Disgusted by the poor service.",2
Emotion Labels:
Label | Emotion |
---|---|
0 | Sadness |
1 | Joy |
2 | Love |
3 | Anger |
4 | Fear |
5 | Surprise |
EmotiNet/
├── include/
│ └── uthash.h # uthash header file
├── main.c
├── network/
│ ├── network.c
│ └── network.h
├── dataParsing/
│ ├── dataParser.c
│ ├── dataParser.h
│ └── vocabHash.h
├── Makefile
├── model.bin # Generated after training
├── emotions.csv # Your dataset
└── README.md
- include/uthash.h: Hash table library used for efficient vocabulary management.
- main.c: Handles user interactions, model training, loading, and prediction.
- network (subfolder): Contains the neural network implementation files.
- dataParsing (subfolder): Handles CSV parsing and vocabulary creation.
- Makefile: Automates the build process.
- model.bin: Binary file storing the trained neural network model.
- emotions.csv: CSV dataset containing text samples and their corresponding emotion labels.
- README.md: Project documentation.
Handling large datasets and extensive vocabularies can lead to high memory consumption. To mitigate this:
- Limit Vocabulary Size: The application restricts the vocabulary to the top 10,000 most frequent words. Adjust
MAX_VOCAB_SIZE
inmain.c
as needed based on your system's capabilities. - Efficient Data Structures: Utilizes hash tables for O(1) word lookups, reducing processing time.
- Memory Monitoring: Use tools like
htop
orvalgrind
to monitor and profile memory usage during execution.
Contributions are welcome! Please follow these steps:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeature
-
Commit Your Changes
git commit -m "Add Your Feature"
-
Push to the Branch
git push origin feature/YourFeature
-
Open a Pull Request
This project is licensed under the MIT License.