This repository contains the assignments and solutions for the CSE 472: Machine Learning Sessional course. Each offline session focuses on key machine learning concepts and algorithms, with hands-on implementations and experimentation.
- Offline 1: Data Preprocessing and Feature Engineering
- Offline 2: Ensemble Learning with Logistic Regression
- Offline 3: Neural Network and Backpropagation
- Offline 4: PCA and Expectation-Maximization
This assignment covers data preprocessing and feature engineering for machine learning models. It includes tasks like cleaning raw data, handling missing values, normalizing datasets, and feature selection.
- Import and preprocess the "IBM HR Analytics Employee Attrition & Performance" dataset.
- Handle missing values, redundancy, and data normalization.
- Convert categorical variables into numerical representations.
- Perform correlation analysis to identify important features.
- Prepare the dataset for a machine learning pipeline.
- Source: IBM HR Analytics Dataset
This assignment focuses on implementing Logistic Regression (LR) from sratch, ensemble learning techniques using bagging and stacking and LR the base classifier.
- Preprocess datasets to standardize input formats.
- Implement Logistic Regression (LR) as the base learner.
- Implement Bagging with 9 LR models and Stacking with LR as the meta-classifier.
- Create a simple majority voting-based ensemble for comparison.
- Evaluate model performance using metrics and violin plots.
This assignment involves implementing a Feed-Forward Neural Network (FNN) from scratch for apparel classification.
- Dense Layer: Fully connected layer.
- Batch Normalization: Normalizes the input for each layer.
- ReLU Activation: Activation function for hidden layers.
- Dropout: Regularization to prevent overfitting.
- Adam Optimizer: Adaptive moment estimation for weight updates.
- Softmax Regression: For multi-class classification.
- Modularize the implementation to allow flexibility in architecture.
- Implement backpropagation and mini-batch gradient descent for training.
- Train and evaluate the FNN using the provided dataset.
- Contains 1000 rows and 500 columns representing 1000 sample points with 500 features each.
- Perform PCA for dimensionality reduction.
- Project data along the two eigenvectors corresponding to the highest eigenvalues.
- Create a 2D scatter plot for visualization.
- Generate UMAP and t-SNE plots using library functions for comparison.
- Represents the number of children in 1000 families, with some given family planning advice.
- Implement the EM algorithm for Poisson mixture models.
- Estimate:
- Mean number of children in families with and without family planning.
- Proportion of families with and without family planning.
git clone <repository_url>
cd <repository_directory>
### Run the relevant .ipynb files under the folders