/SME0828_DataScience

Projects made for the course SME0828 Introduction to Data Science at ICMC USP, 2nd semester of 2020. Concepts like data processing, Exploratory Data Analysis (EDA), and classification are used.

Primary LanguageJupyter NotebookMIT LicenseMIT


Introduction to Data Science

Projects made for the course SME0828 Introduction to Data Science using concepts like data processing, Exploratory Data Analysis (EDA) and classification.

Authors: Aline Fernanda, alinef29
Breno Lívio, brenoslivio
Matheus Victal, matheusvictal

Explore the docs »

Table of Contents

  1. About The Project
  2. Getting Started
  3. License
  4. Acknowledgements

About The Project

The projects are intended for the course SME0828 - Introduction to Data Science, at ICMC - USP, 2nd semester of 2020. The subjects treated were:

'Introduction to Data Science. Introduction to machine learning and data mining. Introduction to Python language. Data acquisition processes. Methods of data aggregation, transformation and cleaning. Search and text processing. Data sampling methods. Learning concepts and data summarization. Case studies.'

For more information about the course check here.

The projects are Jupyter Notebooks divided in three subjects studied throughout the course:

The project consisted in studying the dataset Iris and BostonHouse for applying sampling techniques, creating histograms, detecting outliers using the interquartile range.

Using previous dataset like Iris, the EDA consisted in plotting boxplot, violin plot, comparing Pearson and Spearman correlation and exploring the Anscombe's quartet.

The last project for the course consisted in data classification. The project used many different datasets to proper understand the classification process. We've studied classifiers like Parametric and Non Parametric Bayes, Naive Bayes, Logistic Regression, k-NN, Random Forest and SVM. Techniques like feature selection were used. Some notable datasets were Star dataset to predict star types, some UCI datasets and the famous Titanic Dataset.

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

Python 3.8 or greater, Jupyter Notebook. There are some libraries you may need to install for importing like sklearn, matplotlib and etc.

Installation

  1. Clone the repo
    git clone https://github.com/brenoslivio/SME0828_DataScience.git
  2. Simply run Jupyter Notebook to open the projects.

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements