/ml-from-scratch

Machine Learning from scratch (zero to something) following a structured approach

Primary LanguageJupyter Notebook

Supervised Algorithms

Supervised learning involves training a model on labeled data to make predictions.

Easy

  • Linear Regression
  • Logistic Regression
  • k-Nearest Neighbors (k-NN)
  • Decision Trees

Intermediate

  • Support Vector Machines (SVM)
  • Naive Bayes
  • Random Forest
  • Gradient Boosting Machines (GBM)
  • AdaBoost
  • XGBoost
  • LightGBM
  • CatBoost

Advanced

  • Regularization Techniques (L1/L2 Regularization)
  • Ensemble Methods (Bagging, Boosting, Stacking)
  • Bayesian Linear Regression
  • Gaussian Processes
  • Kernel Methods (Kernel SVM, Kernel Regression)

Unsupervised Algorithms

Unsupervised learning involves finding patterns in unlabeled data.

Easy

  • K-Means Clustering
  • Hierarchical Clustering
  • Principal Component Analysis (PCA)

Intermediate

  • t-Distributed Stochastic Neighbor Embedding (t-SNE)
  • DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
  • Gaussian Mixture Models (GMM)
  • Independent Component Analysis (ICA)

Advanced

  • Non-Negative Matrix Factorization (NMF)
  • Autoencoders (for Dimensionality Reduction)
  • Self-Organizing Maps (SOM)
  • Spectral Clustering
  • Latent Dirichlet Allocation (LDA) for Topic Modeling

Neural Networks

Neural networks are a subset of machine learning models inspired by the human brain.

Easy

  • Perceptron
  • Multilayer Perceptron (MLP)
  • Feedforward Neural Networks

Intermediate

  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)
  • Long Short-Term Memory Networks (LSTM)
  • Gated Recurrent Units (GRU)
  • [] Transfer Learning (e.g., using pre-trained models like ResNet, VGG)

Advanced

  • Generative Adversarial Networks (GAN)
  • Variational Autoencoders (VAE)
  • Transformer Models (e.g., BERT, GPT)
  • Attention Mechanisms
  • Capsule Networks
  • Neural Architecture Search (NAS)
  • Reinforcement Learning with Neural Networks (e.g., Deep Q-Learning)

General

These are foundational concepts and techniques used across all areas of machine learning.

Easy

  • Gradient Descent (Batch, Mini-Batch, Stochastic)
  • Loss Functions (Mean Squared Error, Cross-Entropy)
  • Overfitting and Underfitting
  • Bias-Variance Tradeoff
  • Feature Scaling and Normalization (Min-Max, Z-Score)
  • Train-Test Split
  • Cross-Validation (k-Fold)

Intermediate

  • Hyperparameter Tuning (Grid Search, Random Search)
  • Regularization Techniques (L1/L2, Dropout)
  • Evaluation Metrics (Accuracy, Precision, Recall, F1-Score, ROC-AUC)
  • Dimensionality Reduction (PCA, t-SNE, UMAP)
  • Feature Engineering
  • Data Augmentation

Advanced

  • Optimization Algorithms (Adam, RMSprop, Adagrad)
  • Bayesian Optimization
  • Advanced Evaluation Metrics (Log Loss, Mean Absolute Error, R²)
  • Time Series Analysis (ARIMA, SARIMA)
  • Anomaly Detection (Isolation Forest, One-Class SVM)
  • Explainable AI (SHAP, LIME)
  • Distributed Machine Learning (Apache Spark, Horovod)

Additional Topics

These are advanced or specialized topics that are useful for specific applications.

Advanced

  • Reinforcement Learning (Q-Learning, Policy Gradients)
  • Natural Language Processing (NLP) Techniques (Tokenization, Word Embeddings)
  • Graph Neural Networks (GNN)
  • Federated Learning
  • Meta-Learning (Learning to Learn)
  • Self-Supervised Learning
  • Few-Shot Learning
  • Zero-Shot Learning