/The-Olympics

Leveraging ETL pipelines and machine learning techniques to analyze Olympics data and develop a predictive model that can identify athletes most likely to win medals in their respective events

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

THE OLYMPICS

Olympic Medals

IntroductionObjectiveProblem StatementDataUse CaseScopeProgressLanguages & ToolsLicense


Introduction

The Olympic Games are the world's most prestigious sporting event, featuring the world's best athletes competing for gold, silver, and bronze medals. With over a century of Olympic data, it is possible to use machine learning algorithms to predict which participant is likely to win a medal in a particular event. In this project, we will use Olympics data from 1896 to 2016 to develop a predictive model that can identify athletes who are most likely to win medals in their respective events.

Objective

The project aims to leverage ETL pipelines and machine learning techniques to analyze Olympics data and develop a predictive model that can identify athletes most likely to win medals in their respective events. The model will use a range of features, such as an athlete's age, gender, height, weight and other relevant factors to predict their chances of winning a medal.

The objective of this project is to create a predictive function that when a new athlete details it plugged in, it can predict the likelihood of an athlete winning a medal in the Olympic Games. By analyzing past data, we can identify patterns and trends that can help us predict future outcomes.


Problem Statement

The Olympic Games are highly competitive events, with thousands of athletes competing for medals. It is difficult to identify which athlete is most likely to win a medal in a particular event, as there are many factors that can influence an athlete's performance. This project aims to solve this problem by using machine learning algorithms to analyze past Olympics data and predict which athlete is most likely to win a medal in their respective event.


Data


Use Case

The predictive model developed in this project has several potential use cases:

  • Used by athletic scout for faster talent scout or filter.
  • Used by sports analysts, coaches, and athletes to identify areas for improvement and optimize training programs.
  • Used by broadcasters and media companies to provide more accurate predictions and analysis of the Olympic Games.

Scope

'Technical'
From a technical perspective, the project involves data preprocessing and cleaning, feature engineering, model selection, and evaluation. The machine learning algorithms used will include classification techniques

'Business'
From a business perspective, the predictive model can provide valuable insights for coaches, athletes, and sports analysts, helping them make more informed decisions and optimize performance.


Progress

  1. Data Cleaning ✔️ - Click Here to view
  2. Model Building in SQL ✔️ - Click Here to view
  3. Data Visualization in python using Pandas and plotly ✔️ - Click Here to view
  4. Predictive Model with Scikit Learn and Pycaret ✔️ - Click Here to view
  5. Build a predictive function with the following metrics score ✔️ - Click Here) to view
  • Accuracy: 86.4%
  • Precision: 85.2%
  • Recall: 86.4%
  1. Click here to watch the video

George Box Quote


Quote by George Box
The Sturgeon's law


Language and Tools

mssql python pandas Numpy Plotly Matplotlib scikit_learn Pycaret PowerBI


License

Scouty is license under the MIT license.

© 2023 Jazmine N