This project explores the idea of using different machine learning techniques to determine different stats in NHL games. Research and testing of different techniques has previously mainly been done in this project.
- A database containing about 5000 odds for “Shots on goal” in NHL games from different betting sites
- A way to add more new odds by copy-pasting them from sites into text files
- A way to request data from NHL’s own database to populate your own internal database (in order to use less internet and have a faster running program)
- Produce a CSV file containing all data related to a prediction, this includes player statistics, players teams statistics, and enemy teams statistics
- Load and select features from CSV file to be used in different ML techniques (Data preprocessing)
- Create a pipeline for both using and testing and evaluating ML techniques
- Add more data (maybe xG for both player and teams)
- Create a better way of scraping the web for data
- Create a better way of scraping the web for previous bets and their outcomes
- Create a user interface either as an app or on the web
- Create a more robust framework for different sports with different types of data
- Live predictions
In order to perform different ML techniques, data is needed, for now we use NHL's own (free and very detailed) database to gather all our data. We take this data and store it in our own sqlite database to be used and updated when one wants/needs. The reasoning behind having our own database is quite simple, it’s a lot of data to ask for each time we want to do a prediction. This combined with the fact that we in the end want this process to be done once each day in the season the number of times we ask for a specific game in the database gets quickly out of control.
In order to use the NHL database this incredible documentation was used. Since we know we want all game data for every game x seasons back all we had to do was loop through each day for that period and request all games that occurred on that day. We then took that data and put it in our own database in order to be used later. To update the database is then to only request data for games that have not yet (according to our database) been played, by doing it this way we don’t have to keep requesting data that never changes.
While working on the project a fair bit of different techniques has been tested, these include:
- Logicstic regression
- SVM
- Gaussian Naive Bayes
- Random trees and forests Tests have also been done using a simple Nural net in order to broaden our knowledge in the subject. (Under construction)
To evaluate how well the different algorithms both predict the outcome but also stack up against bookes two different techniques has been implemented. (Under construction)
Side comment: Make sure you have atleast python 3.9 installed, if for some reason "python3" does not work, try using "python" instead.
git clone git@github.com:RasmusRynell/Predicting-NHL.git
Navigate into to project and create an environment
cd Predicting-NHL python3 -m venv env
On Windows:
source env\Scripts\activate.bat
On Unix/MacOS:
source env/bin/activate
python3 -m pip install -r requirements.txt
Side comment: The application is currently accessed through a terminal, this terminal can then in later builds be replaced by a more traditional and easy to use UI.
To first start the application make sure you have followed the instructions under "Installation". When that is done simple navigate to the "app" folder and write the following:
python3 main.py
The application is then started, to then do certain things just enter in a command.
-
"help (h) Prints all currently available commands
-
"exit" (e) Exits the application
-
"eval" (ev) under construction
-
"und" Refreshes/Updates the local NHL database
-
"and" Add nicknames to the database
-
"ubd" Add "old" bets (from bookies but that's located on a local file) to database
-
"gen" Generate a CSV file containing all information for a player going back to 2017/09/15
-
"pre" Preprocess a csv file according to a configuration file