Curated list of fabulous Python Projects for Data Science and Machine Learning. Update every week (hopefully)!
by Davide Nardini.
Social Profile Linkedin.
Personal Blog Pulp Learning.
Inspired by awesome-python.
List of topics
- Annotation
- Computer Vision
- Data API
- Data Manipulation and Analysis
- Data Visualization
- Geopmap Visualization
- Machine Learning
- Natural Language Processing
- Scientific Computing
- Statistics
- Time Series
- Web App for Data Science and Machine Learning
Projects for Annotation.
- pigeon - 🐦 Quickly annotate data from the comfort of your Jupyter notebook
Projects for Computer Vision.
- Opencv - Open Source Computer Vision Library
- Cleanvision - Automatically find issues in image datasets and practice data-centric computer vision.
- Python Tesseract - A Python wrapper for Google Tesseract
- Face Recognition - The world's simplest facial recognition api for Python and the command line
- EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Projects for stream of Data with API.
- statsbombpy - Easily stream StatsBomb data into Python
Projects for Data Manipulation and Analysis.
- Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
- Polars - Fast multi-threaded, hybrid-out-of-core DataFrame library in Rust | Python | Node.js
- Skimpy - skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
- Dataprep - Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
- CleverCSV - CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
Projects for Data Visualization.
- Chartify - Python library that makes it easy for data scientists to create charts
- Matplotlib - plotting with Python
- Plotly - The interactive graphing library for Python
- Seaborn - Statistical data visualization in Python
- Bokeh - Interactive Data Visualization in the browser, from Python
- PyGWalker - PyGWalker: Turn your pandas dataframe into a Tableau-style User Interface for visual analysis
- pyvis - Python package for creating and visualizing interactive network graphs.
Projects for Geopmap Visualization.
- folium - Python Data. Leaflet.js Maps.
- leafmap - A Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment
Projects for Machine Learning.
- scikit-learn - scikit-learn: machine learning in Python
- imbalanced-learn - A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
- PyGAD - Source code of PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).
- shap - A game theoretic approach to explain the output of any machine learning model.
Projects for Natural Language Processing.
- NLTK - NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing.
- gensim - Topic Modelling for Humans
- spaCy - Industrial-strength Natural Language Processing (NLP) in Python
- TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
- textstat - python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
- texthero - Text preprocessing, representation and visualization from zero to hero.
- clean-text - Python package for text cleaning
- lingua-py - The most accurate natural language detection library for Python, suitable for long and short text alike
- langdetect - Port of Google's language-detection library to Python.
- Neattext - NeatText a simple NLP package for cleaning textual data and text preprocessing
Projects for Scientific Computing.
- Numpy- The fundamental package for scientific computing with Python.
- Scipy- SciPy library main repository
Projects for Statistics.
- Statsmodels - Statsmodels: statistical modeling and econometrics in Python
- Pingouin - Statistical package in Python based on Pandas
Projects for Time Series.
- Orbit - A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
- Skrime - A unified framework for machine learning with time series
- Darts - A python library for user-friendly forecasting and anomaly detection on time series.
- Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
- Kats - Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.
Projects for Web App for Data Science and Machine Learning.