A curated list of awesome frameworks, libraries, tools, tutorials, datasets, and research papers in machine learning. This list covers a wide array of topics, from foundational algorithms to modern techniques in supervised, unsupervised, and reinforcement learning.
- Frameworks and Libraries
- Tools and Utilities
- Algorithms and Techniques
- Model Evaluation and Tuning
- Feature Engineering
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Datasets
- Research Papers
- Learning Resources
- Books
- Community
- Contribute
- License
- Scikit-learn - A comprehensive Python library for machine learning with efficient tools for data analysis.
- TensorFlow - An open-source platform for machine learning and deep learning by Google.
- PyTorch - An open-source machine learning framework popular for its dynamic computation graph.
- XGBoost - A scalable, efficient, and widely-used gradient boosting library.
- LightGBM - A fast, distributed, high-performance gradient boosting framework.
- CatBoost - A gradient boosting library with built-in support for categorical features.
- MLflow - An open-source platform for managing the end-to-end machine learning lifecycle.
- Weights & Biases - A tool for experiment tracking, model monitoring, and hyperparameter optimization.
- DVC (Data Version Control) - A version control system for machine learning projects.
- Optuna - An automatic hyperparameter optimization framework.
- Streamlit - A library for creating interactive machine learning web apps quickly.
- Linear Regression - A simple, yet powerful, supervised learning algorithm for regression tasks.
- Logistic Regression - A classification algorithm based on the logistic function.
- Decision Trees - A non-parametric supervised learning algorithm used for classification and regression tasks.
- Random Forest - An ensemble learning method using multiple decision trees.
- Gradient Boosting - A technique for building predictive models through an ensemble of weak learners.
- Cross-Validation - A statistical method used to estimate the performance of a model.
- Confusion Matrix - A tool for evaluating the performance of classification algorithms.
- Precision, Recall, F1 Score - Metrics for evaluating the accuracy of a classification model.
- Grid Search - A method for hyperparameter optimization through exhaustive search.
- Bayesian Optimization - A method for optimizing hyperparameters using probabilistic models.
- Pandas - A Python library for data manipulation and analysis.
- FeatureTools - An open-source library for automated feature engineering.
- Missingno - A Python library for visualizing missing data.
- Category Encoders - A collection of scikit-learn compatible transformers for encoding categorical features.
- Principal Component Analysis (PCA) - A technique for dimensionality reduction.
- Support Vector Machines (SVM) - A powerful algorithm for classification tasks.
- K-Nearest Neighbors (KNN) - A simple, instance-based learning algorithm.
- Naive Bayes - A family of probabilistic classifiers based on Bayes' theorem.
- Ensemble Methods - Techniques like bagging and boosting for improving model accuracy.
- Neural Networks - A class of models inspired by the human brain's structure.
- K-Means Clustering - A popular clustering algorithm for partitioning data into K clusters.
- Hierarchical Clustering - A method of cluster analysis that builds a hierarchy of clusters.
- DBSCAN (Density-Based Spatial Clustering) - A clustering algorithm that identifies dense regions of data points.
- Gaussian Mixture Models (GMM) - A probabilistic model for representing normally distributed subpopulations within an overall population.
- Dimensionality Reduction - Techniques like PCA and t-SNE for reducing the number of features.
- Q-Learning - A value-based reinforcement learning algorithm.
- Deep Q-Network (DQN) - A deep learning approach for reinforcement learning tasks.
- Proximal Policy Optimization (PPO) - A policy gradient method for reinforcement learning.
- Actor-Critic Methods - A family of reinforcement learning algorithms that use both policy and value functions.
- OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
- UCI Machine Learning Repository - A collection of datasets for machine learning research.
- Kaggle Datasets - A platform for accessing diverse datasets and participating in competitions.
- Google Dataset Search - A search engine for discovering datasets across the web.
- OpenML - An open platform for sharing datasets and machine learning experiments.
- Data.gov - A portal for accessing public datasets.
- A Few Useful Things to Know About Machine Learning (2012) - A paper discussing important concepts in machine learning.
- The Elements of Statistical Learning (2001) - A comprehensive book on statistical learning.
- Gradient Boosting Machine Learning (2001) - The original paper introducing Gradient Boosting.
- Coursera: Machine Learning by Andrew Ng - A comprehensive course on machine learning.
- Fast.ai - Free courses and resources for practical machine learning.
- Google Machine Learning Crash Course - A fast-paced introduction to machine learning.
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron - A practical guide to machine learning.
- Pattern Recognition and Machine Learning by Christopher Bishop - A book covering the fundamentals of machine learning.
- Machine Learning Yearning by Andrew Ng - A guide on structuring machine learning projects effectively.
- Reddit: r/MachineLearning - A subreddit for discussions on machine learning.
- Kaggle - A platform for data science competitions and community interaction.
- Scikit-learn Mailing List - A place to discuss issues and features in scikit-learn.
Contributions are welcome!