World Airplane Crashes Analysis(1908-2009)

Cleaning and Analyzing data in Power query and Power BI

Introduction

This project is my capstone project for #30daysoflearning organized by Olanrewaju Oyinbooke. I analyzed a dataset from kaggle about airplane crashes and fatalities around the world from a period of 1908-2009. The dataset contained 5268 cases of crashes and 13 columns which included information about date of crash, number of people aboard, fatalities, location, its operator/airline,type of aircraft and a summary of the cause of the crash.

Raw data

Data Cleaning

After importing my data into power query, I checked for duplicates(found none),the validity of my data using column distribution, quality and profiling and removed columns not neccessary for my analysis. I also made sure my columns were in the right data type.

data cleaning

Data Preprocessing

I noticed the location column contained cities, states and countries so i split them using the "split by delimiter function"(since i needed the countries). In the newly created column, i discovered there were variations of spellings and USA was represented by its states. I cleaned this up using Fuzzy Clustering in Power query and replacing the states with country. You can find the power query script here.

Data Processing

To get the causes of crashes from the Summary column, I extracted the keywords(see script here) from the column and used conditional column to group them into:

  • Mechanical Failure: involves crashes due to engine, propeller or wing failure.
  • Weather condition: includes crashes affected by any form of weather e.g rain, storm, fog etc.
  • Pilot error: involves loss of control.
  • Struck an Object: plane crashed into/or struck a mountain, tree etc.
  • Shot down: plane crashing due to being shot at
  • Fire Outbreak: plane crashed due to explosion or sudden fire.
  • Collision: airplanes colliding.
  • Fuel: crash caused by low fuel
  • Unknown causes

Data Processing 2

Lastly, I made a column to group each operator into Civil or Military Aviation

  • Civil Aviation: this includes all aircrafts used for personal and business purposes rather than for military Purposes.
  • Military Aviation: this includes all aircrafts for the purposes of conducting or enabling warfare.

Data Analysis & Visualizations.

  1. Crashes over the years: crashes kept increasing from 1908 as more aircrafts were produced and airlines came into the industry. In recent years, there is a decrease in the crashes which can be drawn to technological advancements. For Military Aviation, the highest year of crash was in 1945(marked the end of World War II).

Total Crash

  1. Aeroflot and The US Air force had the top most crashes. Aeroflot is Russia largest airline and one of the oldest airlines. Also, The US Airforce accounted for 22.54% of total Military aviation crashes.

crash by airline

  1. The most Crashed aircraft was the Douglas DC 3.

type crash

4.The most fatal crash occurred on March 27, 1977 at Tenerife, Canary islands when 2 Airplanes(a KLM Boeing 747 and a Pan American 747) collided killing a total of 583 people out of 644 people on board.

fatalities

View dashboard here