Discover insights from Data via Python and SQL
You'll need to install
- Python (3.x or higher)
- Jupyter Notebook
- Numpy
- Pandas
- Matplotlib
- Seaborn
And additional libraries defined in each project.
Recommended:
- Anaconda
This chapter was all about the data analysis process as whole. From gathering to cleaning, assessing and wrangling to exploring and visualizing the data over the programming workflow and communication was everything included.
This project included therefore all steps of the typical data analysis process. This includes:
-
posing questions
-
gather, wrangle and clean data
-
communicate answers to the questions
-
assited through visualizations and statistic
D'après le graphique linéaire des ventes d'armes à feu par rapport aux années 1997 à 2016, il y a une tendance à la hausse des achats d'armes à feu avec des augmentations soudaines en 2015 et une diminution en 2016, en partie due à la collecte de données de seulement 9 mois cette année-là.
This chapter was a deep dive into the data wrangling part of the data analysis process. We learned about the difference between messy and dirty data, how tidy data should look like, about the assessing, defining, cleaning and testing process, etc. Moreover, we talked about many different file types and different methods of gathering data.
In this project we had to deal with the reality of dirty and messy data (again). We gathered data from different sources (for example the Twitter API), identified issues with the dataset in terms of tidiness and quality. Afterwards we had to solve these problems while documenting each step. The end of the project was then focused on the exploration of the data.
The final chapter was focused on proper visualization of data. We learned about chart junk, uni-, bi- and multivariate visualization, use of color, data/ink ratio, the lief factor, other encodings, [...].
The task of the final project was to analyze and visualize real-world data. I chose the Ford GoBike dataset.