Introduction to Data Science @ FGV
Instructor: Renato Rocha Souza
This is the repository of code for the "Introduction to Data Science"
This class is about the Data Science process, in which we seek to gain useful predictions and insights from data. Through real-world examples and code snippets, we introduce methods for:
- data munging, scraping, sampling andcleaning in order to get an informative, manageable data set;
- data storage and management in order to be able to access data (even if big data);
- exploratory data analysis (EDA) to generate hypotheses and intuition about the data;
- prediction based on statistical learning tools;
- communication of results through visualization, stories, and interpretable summaries
Detailed Syllabus:
-
Data Science Concepts and Methodologies
- Data Science, Statistics, AI and Machine Learning ref1, ref2, ref3, ref4
- Data Science process
-
Feature Engineering
-
- Numeric Data
- Discrete/Categorical Data
- Textual Data ref1, ref2
-
Oversampling and Undersampling ref1
-
Machine Learning Algorithms ref1, ref2, ref3, ref4, ref5
-
Unsupervised
- Dimensionality reduction
- Clustering ref1, ref2, ref3
- Topic Modeling ref1, ref2
- Unsupervised Deep Learning
-
Supervised
-
Linear Models
-
- Polynomial Regression ref1 ref2
- Stepwise Regression
- Ridge Regression
- Lasso Regression
- ElasticNet Regression
-
-
Bayesian Models
-
Neural Networks and Deep Learning ref1, ref2, ref3, ref4, ref5, video, meme
- Convolutional Neural Networks ref1, ref2, ref3
- Sequence Models
- Word2vec ref1, trained-models
- Generative Adversarial Networks
- Neural Network concepts
-
-
Data Science Tasks
-
Data Science and Visualization Tools
- Versioning Tools
- Exploratory Data Analysis Tools
- Machine Learning Tools
- NLP Tools
- Visualization Tools
- Big Data and Distributed computing
- Map Reduce
- Spark
- Analytical Pipelines
- Other Tools
- Relational databases and SQL
- NoSQL Databases
- Graph Databases