This is a repository for Udacity Data Analyst Project 1 (Investigate a Dataset). The dataset used in the project is also included in this repository.
The libraries used on this project include:
- Pandas – For storing and manipulating structured data. Pandas functionality is built on NumPy (upgrade to version 0.25.1)
- Numpy – For multi-dimensional array, matrix data structures and, performing mathematical operations
- Matplotlib – For visualizations
- Seaborn - For visualizations
- All libraries and versions are included in the environment.yaml file
I analyzed the dataset which contains information of about 100,000 medical appointments in Brazil focusing on whether the patients show up for appointments or not The analysis is focused on answering the questions:
- Which gender is more likely to miss an appoinment?
- What is the age group of those who missed medical appointment?
- Is being on the Scholarship a factor affecting showing up for appointments?
- Did receiving an SMS lead to showing up for Appointments?
- What is the time difference between the Schedule and Appointment Days of People who showed up and those that missed that appointments?
- Data Wrangling
- Exploratory Analysis
- Conclusions/Results