/No-Show-Medical-Appointments

This is a data analysis project performed on a dataset from Kaggle that holds information for medical appointments in the year 2016.

Primary LanguageJupyter Notebook

No-Show-Medical-Appointments

This is a data analysis project performed on a dataset from Kaggle that holds information for medical appointments in the year 2016. It is a part of the data analysis nanodegree on Udacity. The source of the data in addition to eplanation of the variables present can be found here. It is worth mentioning that some column names found in the description of the dataset on Kaggle are not the actual column names available in the downloadable file. Moreover, some of the column domains may contradict those associated with the description on Kaggle, such as the domain for the variable Handicap which is a categorical variable.

The programming language used throughout the project is python

The code and analysis for this project can be found entirely in one Jupyter Notebook

Primarily, this project focuses on deriving insights into the data, which was first provided as a .csv file. However, a simple data wrangling process had to be carried first as the data was unclean.

  • The data was thoroughly described and explained in markdown cells.
  • 3 questions were stated to be explored after the dataset was cleaned.
  • The data was assessed using visual methods in a Jupyter Notebook as well as in an Excel Sheet and it was later assessed using programmatic assessment only in a Jupyter Notebook.
  • A total of 15 quality issues were identified and fixed.
  • Data exploration and visualization was carried out using statistics and charts (donut charts, vertical/horizontal bar charts)
  • A summary of the conclusions derived for this dataset was stated
  • A summary of the limitations for this dataset was stated

The necessary libraries for this project are:

  1. numpy
  2. pandas
  3. math
  4. matplotlib.pyplot
  5. seaborn
  6. matplotlib.patches