
This project involves the Exploratory Data Analysis (EDA) of an airline dataset to uncover insights about passenger demographics, travel patterns, and airport usage. The analysis includes data cleaning, visualization, and interpretation of key metrics such as gender distribution, age distribution, nationality distribution, and airport analysis.

This project involves the Exploratory Data Analysis (EDA) of an airline dataset. The dataset contains various details about passengers, flights, airports, and flight statuses. Below is a detailed walkthrough of the analysis performed.

Table of Contents

  1. Importing Libraries
  2. Loading the Dataset
  3. Initial Data Exploration
  4. Data Cleaning
  5. Exploratory Data Analysis

Importing Libraries

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns 
import missingno as msno
import plotly.express as px
import folium 
from folium.plugins import HeatMap
import warnings

## Loading the Dataset

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
df = pd.read_csv('/kaggle/input/airline-dataset/Airline Dataset Updated - v2.csv')

Initial Data Exploration

  • The dataset contains 98,619 entries and 15 columns.
  • Checking the datatypes and null values:

Data Cleaning

  • Dropping irrelevant columns: First Name, Last Name, Passenger ID
df = df.drop(['First Name', 'Last Name', 'Passenger ID'], axis=1)

Exploratory Data Analysis

Gender Distribution

  • Unique values in Gender column:
  • Count of each gender:
data = df['Gender'].value_counts().reset_index()
  • Visualization:
fig = px.bar(data, x='index', y='Gender', color='index', color_discrete_sequence=px.colors.sequential.Agsunset, template='plotly_dark')
fig.update_layout(title_text='Number of Males & Females', xaxis_title='GENDER', yaxis_title='COUNT')

Age Distribution

  • KDE plot for age distribution:
from seaborn import kdeplot
kdeplot(data=df, x='Age', hue='Gender')

Nationality Distribution

  • Unique nationalities and their count:
  • Top 10 nationalities:
nation_count = df['Nationality'].value_counts().reset_index()
top_10_countries = nation_count.nlargest(10, 'Nationality')
px.bar(top_10_countries, x='index', y='Nationality', color='index', color_discrete_sequence=px.colors.sequential.Agsunset, template='plotly_dark')
  • Lowest 10 nationalities:
lowest_10_countries = nation_count.nsmallest(10, 'Nationality')
px.bar(lowest_10_countries, x='index', y='Nationality', color='index', color_discrete_sequence=px.colors.sequential.Agsunset, template='plotly_dark')

Airport Analysis

  • Unique airports and their count:
df['Airport Name'].unique()
df['Airport Name'].nunique()
  • Top 10 airports with the highest number of passengers:
airport_name = df['Airport Name'].value_counts().reset_index()
top10_airport = airport_name.nlargest(10, 'Airport Name')
px.bar(top10_airport, x='Airport Name', y='count', color='Airport Name', color_discrete_sequence=px.colors.sequential.Agsunset, template='plotly_dark')
  • Top 10 airports with the lowest number of passengers:
bottom10_airport = airport_name.nsmallest(10, 'Airport Name')
px.bar(bottom10_airport, x='Airport Name', y='count', color='Airport Name', color_discrete_sequence=px.colors.sequential.Agsunset, template='plotly_dark')


This exploratory data analysis provides insights into the demographics and travel patterns of airline passengers. The visualizations help in understanding the distribution of genders, ages, nationalities, and the most and least frequented airports.