Lingualytics - The Gender and Age Detection project is a Natural Language Processing (NLP) based project that uses Machine Learning to predict the gender and approximate age of individuals by analyzing the linguistic features and patterns in their text data, messages, comments or chat conversations. The project is implemented using deep learning techniques and leverages a dataset of textual data. It can be used for various applications, such as targeted marketing and advertising, social media analytics, demographic and sentiment analysis, personalized healthcare recommendations, enhanced customer services, chatbot optimization and content moderation,security and surveillance, and personalized recommendations.
The main components of this project include:
- Data Collection: Gathering a dataset of textual data and messages with corresponding age and gender labels.
- Data Preprocessing: Preprocessing the data to ensure they are suitable for training deep learning models.
- Model Building: Creating deep learning models for gender and age prediction.
- Training: Training the models on the prepared dataset.
- Evaluation: Assessing the model's accuracy and performance.
- Inference: Using the trained model to predict the gender and age from textual data.
- Visualization: Visualizing the results and demographic distributions.
- The project can serve as a foundation for developing applications related to gender and age prediction from textual data.
- Project Structure
- Data Collection
- Data Preprocessing
- Model Building
- Training
- Evaluation
- Inference
- Visualization
- Dependencies
- Setup
The project is organized into several key components:
data/
: Contains the dataset of textual data with gender and age labels.notebooks/
: Jupyter notebooks for data preprocessing, model training, and visualization.models/
: Saved model checkpoints.src/
: Source code for data preprocessing, model building, and inference.results/
: Output visualizations and evaluation metrics.
Data for this project can be collected from various sources or datasets containing textual data with gender and age labels. Ensure that the data is organized in a structured manner.
Before training, the text data need to be preprocessed. This includes resizing, normalization, and data augmentation if necessary.
Deep learning models are used for gender and age prediction.
Training the model involves feeding the preprocessed data into the model and iteratively adjusting model weights.
Evaluate the model's performance using appropriate metrics such as accuracy, loss, or mean squared error.
After training, use the model to make predictions on new textual data.
Visualize the results and demographic distributions using the website.
Ensure you have the following dependencies installed:
- Python 3.x
- Pytorch
- Pandas
- Transformers
- Hugging_Face_Accelerator
- Flask
- next.js
- Other necessary libraries
- Clone the repository:
git clone https://github.com/Anjali7070/Hack-o-masters.git
cd Hack-o-masters
- Importing Lingualytics Model:
git lfs install
git clone https://huggingface.co/saurabhalp/linguilytics
Now move model.pth to the backend directory.
- Set up your environment and install dependencies:
pip install -r requirements.txt
-
cd frontend npm i npm run dev
-
cd .. cd backend python server.py
PPT - https://docs.google.com/presentation/d/1TZgp6ghtxH3RwJkShzKPj-YgZ2uxTDF0/edit?usp=drive_link&ouid=110831135010972549841&rtpof=true&sd=true
Video - https://drive.google.com/file/d/1DszJadp0Xqj69Cb6pohi0HBBv-hZTObN/view?usp=drive_link