/week-3-Multivariate-Analysis-and-Data-Visualization-with-Matplotlib-and-Seaborn

Multivariate Analysis (Correlation, PCA) and Data Visualization with Matplotlib and Seaborn

Primary LanguageJupyter Notebook

Multivariate Analysis and Data Visualization with Matplotlib and Seaborn

The main goal of this lecture:

Part one: class organization

  1. introduce new students in the class (Yes, we have new students come every class....)
  2. all students talk about the progress of their research proposal writing (due next week)

Part two: programming

For all students

Multivariate Analysis with Visualization
  1. Correlation Headmap
  2. PCA
  3. LDA (linear discriminant analysis, not the text mining one....)
  4. Introduction to F1, recall and precision: common metrics for machine learning
Visualization with Flask

build your data analytic web app!

the new students only

(a makeup 30-minute section after class, or make an appointment with me)

  1. terminal operation: call jupyter notebook, learn about 'pip install XXXXX'
  2. notebook from week1: intro to Pandas, load dataset into jupyter notebook, data exploration analysis, data cleaning
  3. notebook from week1: learn about data structure (Lecture_One_Data_Structure.ipynb)
  4. notebook from week2: learn about linear regression and the concepts of p-values,sklearn and statistic computation package (Lecture_Two_Linear_Regression)

Part three: project management

  1. every student create individual project in ColumbiaPython (we have four now)
  2. finish research proposal in github as a readme file: a. what is your research question? b. why your dataset can answer the question?
  3. upload new files into github (reference papers, data & codes)

Prepare for Next Week

For Next Week: Introduction to Machine Learning for Classification

Logistic Regression:
  1. example question, concepts, data analysis
  2. Do we really understand the Log Loss Calculation in Logistic Regression? Instead of Mean Squared Error for Linear Regression, we use a cost function called Cross-Entropy, also known as Log Loss for Logistic regression.
  3. Binary Classification
  4. One vs. Rest: Multiple Categories Classification
Naive Bayes:

Why Naive Bayes is naive?

SVM:

Explain how SVM works. What is the difference between SVM and Random Forest?

Reference:

  1. https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html