The aim of this project is to perform a comprehensive analysis of weather data using Python. Weather data provides us with information about temperature, humidity, wind speed, and other atmospheric conditions. This project also aims to address the common challenge of missing data in data analysis. The dataset used is "Weather Records.xlsx", which contains 13 columns, including a prediction column for precipitation (in).
To deal with missing data, I used various methods such as forward fill, backward fill, and interpolation. I only dropped rows if they could not be filled in by any of these methods. For the date column, I created new features such as month, year, and day of the week.
The following columns were dropped due to missing data:
- Snow Depth: Too many missing values
- Wind Gust (mph): No clear correlation with precipitation
- Sea Level Pressure (in): Not relevant to precipitation prediction
I used PCA for dimensionality reduction with 3 components. I encoded categorical features such as the month and day of the week into numerical representations. PCA helped to reduce the dimensionality of the dataset and identify the most important features for precipitation prediction.
I created a scatterplot for the precipitation column and the principle components to determine the relationship between them so I could identify which component will help in predicting the value of precipitation. The three scatterplots generated are attached below:
By Analyzing the ScatterPlots we can conclude the fact that the first principal component had a strong positive correlation with precipitation.
In conclusion, I was able to address the challenge of missing data in weather data analysis by using various methods for filling in missing values and dropping columns with irrelevant or missing data. I gained insights into the data through visualization and dimensionality reduction techniques, and identified the most important features for precipitation prediction.
- Numpy
- Pandas
- Matplotlib
- Sklearn
- Seaborn
- Muhammad Ahmed Suhail
- This project was completed as an assignment for Data Analysis at FAST - NUCES Islamabad.