This is a solution to the airplane crashes project of the Data Analysis in Power BI track of NG 30 Days of Learning.
The goal of analyzing this data is to get insights into airplane crashes that occurred between 1908 and 2009
The data used was obtained from this github repository.
The data had 13 columns which are: • Date • Time • Location • Operator • Flight # • Route • Type • Registration • cn/In • Aboard • Fatalities • Ground • Summary Of these columns only the date column had non-null values. The data was cleaned using python, excel before modelling with power bi.
The data was loaded into python and the following was done:
- A column was created to store the country code for the location of the crash
- Text clustering was done on the summary column to understand the causes of the crash. 6 clusters were obtained.
- The null values of the data were replaced with unknown.
- The data was saved.
The data was opened in excel and the following was done:
- Due to the separation of some countries like USSR, Yugoslavia etc., the api used to obtain the country code from the location returned a lot of unknown. These were manually checked and the current country code of such places replaced the unknown values.
Further cleaning and modelling were done in power bi.
- Total number of crashes between 1908 and 2009 is 5,236.
- Compared to the previous year-till-date; the total number of fatality (this includes the ground and fatalities column) and fatality rate increased while the number of crashes decreased.
- 1972 is the year with the highest number of crashes (104).
- The 9/11 attack is the only one that resulted in more ground casualties.
- Most crashes occur in December, closely followed by January and August.
- Weather forecast should be properly done before flights take off.
- Airline operators should make necessary checks to ensure the plane is in good shape so as to prevent crashes due to engine failure.
The summary column which contains a description about the crash didn’t contain uniform data. Some had more descriptions, others little or nothing.
Two dashboards were created using the 5W approach; a static and interactive one. Both are in the images folder
- Ugochukwu
- Twitter - @_EightKing