You know the story. Data is everywhere: texts, images, news, and spreadsheets. It affects our habits and defines our future. The amount of data is growing by the second. How can one stay afloat in this great sea of data? Data analysis is required in any line of business. In this project, you will conduct a comprehensive study with pandas. You will upload datasets, deal with data omissions and incorrect data filling, find the main statistical characteristics, and visualize your data.
Conduct a comprehensive data study using the pandas library: from uploading data and correcting errors in the CSV files to simple data visualization.
Stage 1 : Load data from CSV files to the program.
Stage 2 : Make a single dataset from several CSV files.
Stage 3 : Improve the dataset which may be inconsistent and contain errors.
Stage 4 : Use pandas statistics tools to gain insights from data.
Stage 5 : Use pandas visualization tools to present the data succinctly.
To learn more about this project, please visit HyperSkill Website - Data Analysis for Hospitals.
This project's difficulty has been labelled as Hard where this is how HyperSkill describes each of its four available difficulty levels:
- Easy Projects - if you're just starting
- Medium Projects - to build upon the basics
- Hard Projects - to practice all the basic concepts and learn new ones
- Challenging Projects - to perfect your knowledge with challenging tasks
This Repository contains one .py file and one folder:
code.py - Contains the code used to complete the data analysis requirements
Data repository - Contains the three .csv files that contain the data: general.csv, prenatal.csv, and sports.csv
Project was built using python version 3.11.3
All three datasets contain the following 15 columns:
Unnamed: 0
- Contains the indexes of the tableshospital
gender
age
height
weight
bmi
diagnosis
- Includes values such as 'pregnancy', 'cold', 'dislocation', etcblood_test
ecg
ultrasound
mri
xray
children
months
Download the files to your local repository and open the project in your choice IDE and run the project. The different data frames and answers to the questions will be printed on the console as well as the required plots for visualization according to the requirements stated in each stage's docstring. Please read each Stage's docstring to know the requirements.