This project focuses on classifying human activities using data collected from smartphone sensors. The main components are data preprocessing, feature analysis, model training, and evaluation. The project includes scripts for data preparation, model training, and a Streamlit dashboard for model evaluation.
dataset
: https://www.kaggle.com/datasets/uciml/human-activity-recognition-with-smartphones
train.py
: Script for data preprocessing, feature analysis, and model training.app.py
: Streamlit app for evaluating the trained model and visualizing test set results.
Ensure you have the following libraries installed:
- pandas
- numpy
- plotly
- scikit-learn
- torch
- streamlit
Install the required packages using:
pip install pandas numpy plotly scikit-learn torch streamlit
The dataset is expected to be in the data/
directory with the following files:
train.csv
test.csv
train.py
handles the following tasks:
- Load and Inspect Data: Load training and test datasets, and perform basic data inspection.
- Data Cleaning: Check for duplicates and missing values.
- Exploratory Data Analysis: Generate visualizations for activity distribution and feature analysis.
- Feature Selection: Identify and select highly correlated features.
- Model Training: Train a Random Forest classifier and a CNN model, evaluate their performance, and identify important features.
Run the script using:
python train.py
- Data loading and inspection
- Exploratory data analysis (EDA) using Plotly for visualizations
- Feature selection based on correlation
- Random Forest and CNN model training
- Model evaluation and feature importance analysis
app.py
is a Streamlit app for evaluating the trained model and visualizing test set results. It includes:
- Model Evaluation: Evaluate the model on the test set and display loss and accuracy.
- Activity Distribution: Visualize the distribution of activities in the test set.
- Individual Instance Evaluation: Select and evaluate individual instances from the test set.
- Feature Exploration: Interactive feature distribution visualization.
Run the app using:
streamlit run app.py
- Displays test set summary including shape, loss, and accuracy
- Visualizes activity distribution in the test set
- Allows evaluation of individual test instances
- Provides interactive feature exploration
.
├── data/
│ ├── train.csv
│ ├── test.csv
├── train.py
├── app.py
└── README.md
- Ensure the dataset files are in the
data/
directory. - The trained model and scaler should be saved as
model.pth
andscaler.pkl
respectively for use in the Streamlit app.
This project uses the Human Activity Recognition dataset.