Exploratory Data Analysis- EDA

This repository contains Exploratory Data Analysis (EDA) project using Python Pandas. The project focuses on analyzing marathon race data sourced from Kaggle, utilizing Jupyter Notebook for coding.

Overview

The dataset used in this project comprises approximately 7 million records of marathon race data spanning from 1798 to 2022. The primary objective of the analysis is to perform EDA to gain insights into marathon races, particularly focusing on races held in the USA in the year 2020.

Data Source

The dataset used for this analysis is sourced from Kaggle. You can find the dataset here : https://www.kaggle.com/datasets/aiaiaidavid/the-big-dataset-of-ultra-marathon-running/discussion/420633

Python Libraries Used

Python Seaborn

Project Details: Analysis of races 50km and 50mi, that happened in the 'USA' year 2020

Difference in Speed for 50km and 50mi Races, Male vs. Female: Compare the average speeds of male and female participants in both 50km and 50mi races to understand the gender differences in speed.

Age Group Performance in 50km Races: Analyze the performance metrics (such as average race times, finishing rates, etc.) across different age groups to determine which age groups perform the best in 50km races.

Age Group Performance in 50mi Races: Similarly, analyze the performance metrics across different age groups to identify which age groups excel in 50mi races.

Age Group Analysis for 50km Races: Investigate the performance metrics to identify which age groups exhibit the slowest performance or have the lowest finishing rates in 50km races.

Hardest Season for Running:

Analyze race data across different seasons (spring, summer, fall, winter) to determine which season poses the most challenging conditions for running in terms of race times, participant dropout rates, etc.

By conducting a detailed analysis along these lines, we can gain insights into the performance trends, gender differences, age group dynamics, and seasonal variations in 50km races that took place in the USA during the year 2020. This analysis could help race organizers, coaches, and participants better understand the factors influencing race performance and prepare accordingly.

Question to answer based on analysis:

Question 1: difference in speed for the 50km and 50mi, Male vs Female

Question 2: what age group are the best in the 50km races

Question 3: what age group are the best in the 50mi races

Question 4: what age group are the worst in the 50km races

Question 5: what age group are the worst in the 50mi races

Question 6: Which season it's the hardest for running ? Spring Summer Fall Winter