/Medical-Appointment-No-Shows

To investigate the reason why some patients do not show up to their scheduled appointments

Primary LanguageJupyter NotebookMIT LicenseMIT

Medical Appointment No Shows

Image Source - afootdoctorsjournal.wordpress.com

Kaggle Notebook

Why Patients DO NOT Show-up? Wrangling, EDA & Viz

Google Colaboratory

Medical-Appointment-No-Shows.ipynb

Problem Statement

Many patients book the appointment with the doctor and then failed to attend scheduled appointments. The average No-Show is 20% leading to lower clinical efficiency and loss of 20 million every year to the Brazilian economy.

Objective

To investigate the reason why some patients do not show up to their scheduled appointments.

Source of dataset

Data was gathered from kaggle’s Medical Appointment No Show dataset and loaded in google colaboratory for analysis.

Key Insights

The proportion of males and females patient in the dataset is 35% and 65% respectively. The proportion of females is much more than males in the dataset

There is an increase in the show-up rate among diabetic males and females. We can say that diabetic patients are more likely to attend the scheduled appointment

There is no strong correlation b/w show-up and other features. There is positive correlation b/w diabetes and hypertension i.e 0.43. Patients getting reminders via SMS and show-up rate have a negative correlation. Patients getting reminders regarding appointment may be are more likely to no-show.

Saturday has a slightly higher no-show rate in comparison to other days. We can't conclude appointments on Saturdays will be more likely to be not shown due to less number of appointments

An outlier is defined as a data point that is located outside the whiskers of the box plot. There is outlier at 115 and the right bound is 110

Neighborhoods with the highest show-up rate are JARDIM CAMBURI with 81% show-up rate following MARIA ORTIZ with 79% show-up rate

From JARDIM CAMBURI there were highest appointments i.e. 7717

It is evident from the above graph that people who got SMS reminders irrespective of their gender were the ones with less show-up rate. Failing to remember the appointment is not an issue for the no-show.

Important Points

• Dataset has more than 100K records/rows.

• In data-wrangling major time was devoted to assessing and cleaning data. Data was dirty and messy with issues in its content.

• Cleaning invalid data like float datatype for PatientID and AppointmentID, negative values in age column which is impossible.

• Removing irrelevant data like Appointment Time which was 00:00:00 (HH:MM:SS) in all the rows, some records have appointment day before the scheduled day.

• Transforming messy data like ScheduledDay and AppointmentDay having multiple variables in date-time format (dd-mmm-yyyy HH:MM:SS) in a single column. They were separated into different columns such that there is one variable per column.

• Renaming column name in snake case to access the column using period with data frame like df.column_name

• Summarizing features and finding descriptive statistics like a five-number summary for the age column.

• Handling outliers in age column using 68–95–99.7 rule.

• Undertaken exploratory data analysis (EDA) to find the important feature responsible for the no-show.

• To support our analysis used libraries like matplotlib and seaborn to make clean, uncluttered design with easy-to-interpret data visualization.

• Both categorical and quantitative variables were used for visualization.

Conclusion

• Important features to predict no-shows are age, hypertension, diabetes, neighborhood, and scholarship

• Showing rate for appointment is more at older age group than younger age group maybe because people are more concerned for their health in old age than than in younger age.

• Hypertension is an important characterstic of patients with higher attendance frequency

• Showing rate for men and women are similar

• Their is no relation of showing up wrt alcholism and handicaps

• Enrolment in Bolsa Familia program or scholarhip is an important feature to determine patients with higher attendance frequency

• There is no preference for specific weekday over the other when it comes to attendance frequency.

• There is no direct relation of No-Show with shorter or longer waiting days between appointment and schedule day.

Improvement opportunities at administration level

• Their were no appointments at sundays and appointments at saturday's were signifigantly low in comparison to weekdays. Administration should spread the appointments across months and weekdays regardless of weekends.

• Administration should avoid scheduling multiple appointments in the same day for a single patient unless their is emergency. Giving everyone a chance to get second appointment after missing first appointment gives patient a relief of anyway getting another appointment.

• Administration should charge a part of fees in advance while patient is scheduling appointment. This may reduce no-show rate.

• Administration should follow a systematic approach when sending reminders to patients while closely monitoring the associated costs of sending follow-up reminders

Limitations

• Appointment Time in all the rows is same that is 00:00:00.If appointment time was specified properly then we could have find time intervals over which people prefer to miss scheduled appointment

• Handicap section has five different values(0,1,2,3,4) but on Kaggle author describes it as column with boolean values where 0 represents person with no-handicap and 1 with handicap.We have assumed that 1,2,3,4 are handicaps and changed them all to 1 to represent person with handicap. While 0 represents person with no handicap.

• Distance from neighbourhood to the hospital is not given which may prove to be very effective in determining neighbourhoods which are far away from hospital

• Dataset only have data for month April, May and June. If data for other months was provided then we could have find months with least show-up rates.