/DataSciencePokemon

A list of fabulous Python Projects for Data Science and Machine Learning

MIT LicenseMIT

Data Science Pokémon

Curated list of fabulous Python Projects for Data Science and Machine Learning. Update every week (hopefully)!

by Davide Nardini.

Social Profile Linkedin.

Personal Blog Pulp Learning.

Inspired by awesome-python.

List of topics

Annotation

Projects for Annotation.

  • pigeon - 🐦 Quickly annotate data from the comfort of your Jupyter notebook

Computer Vision

Projects for Computer Vision.

  • Opencv - Open Source Computer Vision Library
  • Cleanvision - Automatically find issues in image datasets and practice data-centric computer vision.
  • Python Tesseract - A Python wrapper for Google Tesseract
  • Face Recognition - The world's simplest facial recognition api for Python and the command line
  • EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Data API

Projects for stream of Data with API.

  • statsbombpy - Easily stream StatsBomb data into Python

Data Manipulation and Analysis

Projects for Data Manipulation and Analysis.

  • Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
  • Polars - Fast multi-threaded, hybrid-out-of-core DataFrame library in Rust | Python | Node.js
  • Skimpy - skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
  • Dataprep - Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
  • CleverCSV - CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

Data Visualization

Projects for Data Visualization.

  • Chartify - Python library that makes it easy for data scientists to create charts
  • Matplotlib - plotting with Python
  • Plotly - The interactive graphing library for Python
  • Seaborn - Statistical data visualization in Python
  • Bokeh - Interactive Data Visualization in the browser, from Python
  • PyGWalker - PyGWalker: Turn your pandas dataframe into a Tableau-style User Interface for visual analysis
  • pyvis - Python package for creating and visualizing interactive network graphs.

Geopmap Visualization

Projects for Geopmap Visualization.

  • folium - Python Data. Leaflet.js Maps.
  • leafmap - A Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment

Machine Learning

Projects for Machine Learning.

  • scikit-learn - scikit-learn: machine learning in Python
  • imbalanced-learn - A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
  • PyGAD - Source code of PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).
  • shap - A game theoretic approach to explain the output of any machine learning model.

Natural Language Processing

Projects for Natural Language Processing.

  • NLTK - NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing.
  • gensim - Topic Modelling for Humans
  • spaCy - Industrial-strength Natural Language Processing (NLP) in Python
  • TextBlob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
  • textstat - python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
  • texthero - Text preprocessing, representation and visualization from zero to hero.
  • clean-text - Python package for text cleaning
  • lingua-py - The most accurate natural language detection library for Python, suitable for long and short text alike
  • langdetect - Port of Google's language-detection library to Python.
  • Neattext - NeatText a simple NLP package for cleaning textual data and text preprocessing

Scientific Computing

Projects for Scientific Computing.

  • Numpy- The fundamental package for scientific computing with Python.
  • Scipy- SciPy library main repository

Statistics

Projects for Statistics.

  • Statsmodels - Statsmodels: statistical modeling and econometrics in Python
  • Pingouin - Statistical package in Python based on Pandas

Time Series

Projects for Time Series.

  • Orbit - A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.
  • Skrime - A unified framework for machine learning with time series
  • Darts - A python library for user-friendly forecasting and anomaly detection on time series.
  • Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
  • Kats - Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Web App for Data Science and Machine Learning

Projects for Web App for Data Science and Machine Learning.

  • gradio - Create UIs for your machine learning model in Python in 3 minutes
  • Dash - Data Apps & Dashboards for Python. No JavaScript Required.
  • Streamlit - Streamlit — The fastest way to build data apps in Python
  • Panel - A high-level app and dashboarding solution for Python