This is the final project for the machine learning course.
In this project, I implemented the following:
- EDA
- Data cleansing(Filling nulls & Handling outliers)
- Feature Engineering
- Optimize memory usage(The data had 20 million entries)
- Built a Data preperation pipeline that takes the raw data and returns the preprocessed data, ready for the ML model: Used all variables types(numbers, text)
- Trained the data using decision trees, neural networks, and linear regression.
- Used Cross Validation
- Developed a standalone Colab notebook that gets the saved model and makes prediction on new data
Data: https://www.kaggle.com/c/ashrae-energy-prediction/data