Repository where I collect all the assignments for the Udacity Data Analytics Nanodegree.
Statistics -- Check repo
• Analyzed the Stroop effect using descriptive statistics to provide an intuition about the data, and inferential statistics to draw a conclusion based on the results.
Skills: Python, IPython, Pandas.
Exploratory Analysis with Pandas -- Check repo
• What variables are related to surviving the Titanic? In this data set I posed this question and used different descriptive and modelling strategies to uncover these relations.
Skills: Python, IPython, Pandas.
Data gathering and wrangling with SQL and Pandas -- Check repo
• Parsed 140 Mb XML document to obtain relevant data. • Cleaned, audited and corrected more than 2500 registries. • Stored cleaned data in a SQL database, performed queries and generated plots. Created map plots to inspect georeferenced data.
Skills: Python, SQL, XML parsing, regular expressions, Pandas, BaseMap, GeoPandas.
Exploratory Analysis with R -- Check repo
• Cleaned, merged and analyzed data on consequences of earthquake in the world from the 1900s. • Created a notebook with clear steps for getting, cleaning and merging the data from different sources. Created a codebook with all the variables included in final dataset. • Created more than 20 visualizations to understand the data. Analyzed the conditional relationships of deaths of earthquakes given its magnitude and regime type/gdp per capita.
Skills: R, R Studio, ggplot, Python, pandas, GeoPandas.
Machine Learning - Enron Case -- Check repo
• Identified which Enron employees are more likely to have committed fraud using machine learning and public Enron financial and email data. • Trained and tested different algorithms and used feature selection techniques. • Tunned algorithms’ parameters to improve original results.
Skills: Python, Scikit-learn, Pandas, machine learning.
Data visualization - Earthquake project -- Check repo
• Developed visualization where users can fully interact with geographical and temporal features of earthquakes. • Successfully integrated D3.js and Leaflet to produce animations and transitions. • Project featured by Data Science Weekly.
Skills: D3.js, Leaflet, GeoPandas, Pandas, Python.
A/B testing -- Check repo
• Designed an A/B test, including which metrics to measure and how long the test should be run. I also analyzed the results of an A/B test that was run by Udacity, recommended a decision, and proposed a follow-up experiment.
Skills: Pandas, IPython.