Table of Contents
This is an machine learning program made for the subject TDT4173 Machine learning. The task was to find the how much solar power measured in Photovoltaic (PV) systems, which convert sunlight into electricity. This dataset provided data for evaluating solar production dayahead forecasting methods. The data provider is ANEO. With information about all weather features here. These Data was all collected in Trondheim. The data was collected from 2019-01-01 to 2023-07-31. The data was collected every 15 minutes. The data was collected from 3 different locations. These locations were not equal. The power output of location A was 6 times larger then B and C. Location A also had solar panals that were differently angled then B and C. Making it much more trick to make a model learn. There was also much noise in the data, with outages and times during the night with zero sunlight that there was reported solar production, and also times at day were there was no power production due to external factors.
After cloning the project, Look at the final submission folder to see the feature engineering and the model training. The final model is saved in the models folder. To use the model, run the following code in the root directory of the project.
The task was setup so we fight Machine learning algorithms that the professor Ruslan Khalitov have made.
NB: Press the image to see the video of Goslightning Talking to Students at start of semester.
This has been a great task and we have learned a lot. We have learned how to use machine learning to solve a real world problem. We have tried so many things, worked so many late nights and had a lot of fun, and many frustrations. In the end we managed to beat all the bots, and we are very proud of our work. This giving us the best grade possible: A The two bots that where the hardest to beat was Ryleena and Shao-RyKhan. This seems a bit strange as Goslightning is the best bot in the entire tournament, but the reason for this was that most of the project we had tried to to predictions on Kaggle (where we were graded) with bugs in the way we got the test data. This made us think that the bots were better than they actually were. We worked so hard on data that was flawed. It is very impressive that we climbed so high with several flaws in our test data. After fixing that we defeated Ryleena.
We learned that simpler models better models, as we had models so complex that they required to be ran for more then 24 hours before completion. We also learned that the data is the most important part of the project. We spent so much time on feature engineering and data cleaning. We also learned that it is important to have a good workflow, and that it is important to have a good structure of the project. We learned that cloud computing is very powerful and quite easy to setup.
We have beaten the following bots:
The Gosborg 2049 VT was random guessing between 0 and max pv measurement.
The Kenshi VT was using Linear Regression, with no feature engineering or other preprocessing.
Quan Gos Chill was Average for each location at the specified hour.
Gospion was using Random Forest with minial feature engineering.
Frostling was using an AutoML solution using H2O, the VT had some feature engineering and random split.
Frostling used CatBoost with good feature engineering and good hyperparams.
La La Lizard was the avereage of two teaching assistans models
Keno used a single LightGBM with with change target and extensive hyperparameters search. It used one model for all 3 locations.
Shao TyKhan was made by using the best teaching assistants models, then averageing 10 different CatBoost models, having great hyper parameters and good feature engineering. But different to the other Virtual Teams was that it used one model for each location.
Goslightning was the best model that the professor made. This model had extended time to be finished. It used Geometric mean of 10 models from the best teaching assistanst, 1 model averaged from other teaching assistanst solutions, 2 LightGBM models with finetuning from the professor. This was the hardest bot in the compotition.
To install the Power Predictor, one needs to have all the prerequisites installed and set up, and follow the setup guild. The following sections will guide you through the process.
- Ensure Python 3.9 or newer is installed on your machine. Download Python
- Jupyter Notebook
git clone https://github.com/SverreNystad/power-predictor.git
cd power-predictor
🚀 A better way to set up repositories
A virtual environment in Python is a self-contained directory that contains a Python installation for a particular version of Python, plus a number of additional packages. Using a virtual environment for your project ensures that the project's dependencies are isolated from the system-wide Python and other Python projects. This is especially useful when working on multiple projects with differing dependencies, as it prevents potential conflicts between packages and allows for easy management of requirements.
-
To set up and use a virtual environment for Power Predictor: First, install the virtualenv package using pip. This tool helps create isolated Python environments.
pip install virtualenv
-
Create virtual environment Next, create a new virtual environment in the project directory. This environment is a directory containing a complete Python environment (interpreter and other necessary files).
python -m venv venv
-
Activate virtual environment To activate the environment, run the following command:
-
For Windows
source ./venv/Scripts/activate
-
For Linux / MacOS:
source venv/bin/activate
-
With the virtual environment activated, install the project dependencies:
pip install -r requirements.txt
The requirements.txt file contains a list of packages necessary to run Power Predictor. Installing them in an activated virtual environment ensures they are available to the project without affecting other Python projects or system settings.
To run all the tests, run the following command in the root directory of the project:
pytest
Licensed under the MIT License. Because sharing is caring
- data/raw: Original, immutable data dump.
- data/processed: Cleaned and pre-processed data used for modeling.
- data/interim: Intermediate data that has been transformed.
- results/figures: Generated analysis as HTML, PNG, PDF, LaTeX, etc.
- results/output: Contains different solutions generated by models.
- src/data: Scripts to download or generate data. From Data/raw or Data/processed to object that can be worked with.
- src/features: Scripts to turn raw data into features for modeling.
- src/models: Scripts to train models and then use trained models to make predictions.
- src/visualization: Scripts to create exploratory and results oriented visualizations.
final_submission: Contains the two attempts Short_notebook_1 and Short_notebook_2 that has our two allowed attempts at the private leaderboard.
Three brave students that applied their knowledge of Machine Learning to beat the bots.
Gunnar Nystad |
Peter Skoland |
Sverre Nystad |
- Ruslan Khalitov for the task and the bots. This task has been amazing and we have learned a lot.
- Thanks to the group members for the great work and the good collaboration.
- Thanks to our amazing Professor Zhirong Yang for great lectures.