SME0828_DataScience: A Jupyter Notebook repository from brenoslivio

Introduction to Data Science

Projects made for the course SME0828 Introduction to Data Science using concepts like data processing, Exploratory Data Analysis (EDA) and classification.

Authors: Aline Fernanda, alinef29
Breno Lívio, brenoslivio
Matheus Victal, matheusvictal

Explore the docs »

About The Project
Getting Started
- Prerequisites
- Installation
License
Acknowledgements

About The Project

The projects are intended for the course SME0828 - Introduction to Data Science, at ICMC - USP, 2nd semester of 2020. The subjects treated were:

'Introduction to Data Science. Introduction to machine learning and data mining. Introduction to Python language. Data acquisition processes. Methods of data aggregation, transformation and cleaning. Search and text processing. Data sampling methods. Learning concepts and data summarization. Case studies.'

For more information about the course check here.

The projects are Jupyter Notebooks divided in three subjects studied throughout the course:

Data processing

The project consisted in studying the dataset Iris and BostonHouse for applying sampling techniques, creating histograms, detecting outliers using the interquartile range.

Exploratory Data Analysis

Using previous dataset like Iris, the EDA consisted in plotting boxplot, violin plot, comparing Pearson and Spearman correlation and exploring the Anscombe's quartet.

Classification

The last project for the course consisted in data classification. The project used many different datasets to proper understand the classification process. We've studied classifiers like Parametric and Non Parametric Bayes, Naive Bayes, Logistic Regression, k-NN, Random Forest and SVM. Techniques like feature selection were used. Some notable datasets were Star dataset to predict star types, some UCI datasets and the famous Titanic Dataset.

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

Python 3.8 or greater, Jupyter Notebook. There are some libraries you may need to install for importing like sklearn, matplotlib and etc.

Installation

Clone the repo

git clone https://github.com/brenoslivio/SME0828_DataScience.git

Simply run Jupyter Notebook to open the projects.

License

Distributed under the MIT License. See LICENSE for more information.

brenoslivio/SME0828_DataScience