Udacity-Data-Analyst-Nanodegree

Discover insights from Data via Python and SQL

skills Acquired ( Summary)

Prerequisites

You'll need to install

Python (3.x or higher)
Jupyter Notebook
Numpy
Pandas
Matplotlib
Seaborn

And additional libraries defined in each project.

Recommended:

Anaconda

project Overview

P1: Investigate a Dataset (Gapminder World DATASET)

This chapter was all about the data analysis process as whole. From gathering to cleaning, assessing and wrangling to exploring and visualizing the data over the programming workflow and communication was everything included.

This project included therefore all steps of the typical data analysis process. This includes:

posing questions
gather, wrangle and clean data
communicate answers to the questions
assited through visualizations and statistic

D'après le graphique linéaire des ventes d'armes à feu par rapport aux années 1997 à 2016, il y a une tendance à la hausse des achats d'armes à feu avec des augmentations soudaines en 2015 et une diminution en 2016, en partie due à la collecte de données de seulement 9 mois cette année-là.

P2: Gather, Clean and Analyze Twitter Data (WeRateDogs™ (@dog_rates))

This chapter was a deep dive into the data wrangling part of the data analysis process. We learned about the difference between messy and dirty data, how tidy data should look like, about the assessing, defining, cleaning and testing process, etc. Moreover, we talked about many different file types and different methods of gathering data.

In this project we had to deal with the reality of dirty and messy data (again). We gathered data from different sources (for example the Twitter API), identified issues with the dataset in terms of tidiness and quality. Afterwards we had to solve these problems while documenting each step. The end of the project was then focused on the exploration of the data.

P3: Communicate Data Findings

The final chapter was focused on proper visualization of data. We learned about chart junk, uni-, bi- and multivariate visualization, use of color, data/ink ratio, the lief factor, other encodings, [...].

The task of the final project was to analyze and visualize real-world data. I chose the Ford GoBike dataset.