Intel-MachineLearning

Supervised learning algorithms Key concepts like under- and over-fitting, regularization, and cross-validation How to identify the type of problem to be solved, choose the right algorithm, tune parameters, and validate a model

basic data science toolset:

Jupyter Notebook* for interactive coding
NumPy, SciPy, and pandas for numerical computation
Matplotlib and seaborn for data visualization
Scikit-learn* for machine learning libraries.

vocabulary of machine learning:

Supervised learning and how it can be applied to regression and classification problems
K-Nearest Neighbor (KNN) algorithm for classification

the principles of core model generalization:

The difference between over-fitting and under-fitting a model
Bias-variance tradeoffs
Finding the optimal training and test data set splits, cross-validation, and model complexity versus error
Introduction to the linear regression model for supervised learning
Learn about cost functions, regularization, feature selection, and hyper-parameters
Understand more complex statistical optimization algorithms like gradient descent and its application to linear regression

others:

Logistic regression and how it differs from linear regression
Metrics for classification error and scenarios in which they can be used
The basics of probability theory and its application to the Naïve Bayes classifier
The different types of Naïve Bayes classifiers and how to train a model using this algorithm
Support vector machines (SVMs)—a popular algorithm used for classification problems
Examples to learn SVM similarity to logistic regression
How to calculate the cost function of SVMs
Regularization in SVMs and some tips to obtain non-linear classifications with SVMs

advanced supervised learning algorithms:

Decision trees and how to use them for classification problems
How to identify the best split and the factors for splitting
Strengths and weaknesses of decision trees
Regression trees that help with classifying continuous values

bagging and random forest

The concepts of bootstrapping and aggregating (commonly known as “bagging”) to reduce variance
The Random Forest algorithm that further reduces the correlation seen in bagging models

boosting

the boosting algorithm that helps reduce variance and bias.

algorithms that can be used to achieve a reduction in dimensionality, such as:

Principal Component Analysis (PCA)
Multidimensional Scaling (MDS)

unsupervised learning algorithms and how they can be applied to clustering and dimensionality reduction problems.

achieve a reduction in dimensionality:

Dimensionality refers to the number of features in the dataset. Theoretically, more features should mean better models, but this is not true in practice. Too many features could result in spurious correlations, more noise, and slower performance. learn algorithms that can be used to achieve a reduction in dimensionality, such as:

Principal Component Analysis (PCA)
Multidimensional Scaling (MDS)

NautiyalAmit/Intel-MachineLearning