/Machine_learning_data_prep_intro

Machine Learning Data Prep Presentation and Notebook

Primary LanguageJupyter NotebookMIT LicenseMIT

Machine Learning Data Prep: Handling outliers using Python

Summary

This presentation is about data preparation for machine learning. It begins with a general introduction of machine learning. Then briefly introduces the importance of data preparation and handling outliers with Python as
a specific example.

The presentation was delevered twice to Orlando area organizations.
First, to the Orlando Machine Learning and Data Science Meetup Group (OMLDS) on October 26, 2021 during an online Lunch and Learn.
Second, as part of the machine learning track of SQL Saturday Orlando 2022 on October 8, 2022.

Data preparation for machine learning is essential. Outliers are a a fundamental concept to understand in machine learning and statistics. This presentation uses EPA fuel efficiency data as a real world example of how outliers can lead us to identify new and interesting groups/subsets within data.

Small snippets of Python code are highlighted in the presentation slides. The full code is in the notebook within this repository. For those seeking to grow python visual graphing skills, the code shows a method using Seaborn regplots overlayed with scatter plots to highlight outliers with a different color, something not inherent in the Seaborn regplot.


Presentation Slides

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32