A curated list of Python libraries used for data science.
- Machine Learning Frameworks
- Scientific
- Outlier Detection
- Deep Learning Frameworks
- Deep Learning Tools
- Deep Learning Projects
- Visualization
- AutoML
- Exploration
- Feature Extraction
- Trading
- Misc
- Deployment
- Profiling
- Python Tools
- Data Gathering
- scikit-learn - Machine learning.
- CatBoost - Gradient boosting library with categorical features support.
- LightGBM - Fast, distributed, high performance gradient boosting.
- Xgboost - Scalable, Portable and Distributed Gradient Boosting.
- PyMC - Probabilistic Programming.
- statsmodels - Statistical modeling and econometrics.
- SymPy - A computer algebra system.
- NetworkX - Creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
- dask-ml - Distributed and parallel machine learning.
- imbalanced-learn - Perform under sampling and over sampling.
- lightning - Large-scale linear models.
- scikit-optimize - Sequential model-based optimization with a
scipy.optimize
interface. - BayesianOptimization - Global optimization with gaussian processes.
- gplearn - Genetic Programming.
- python-glmnet - glmnet package for fitting generalized linear models.
- hmmlearn - Hidden Markov Models.
- vecstack - stacking (machine learning technique).
- modAL - Modular Active Learning framework
- deap - Evolutionary computation framework.
- pyro - Deep universal probabilistic programming with PyTorch.
- civisml-extensions - scikit-learn-compatible estimators from Civis Analytics.
- hyperopt-sklearn - Hyper-parameter optimization for sklearn.
- scikit-survival - Survival analysis built on top of scikit-learn.
- dstoolbox - Tools that make working with scikit-learn and pandas easier.
- modin - Unify the way you interact with your data.
- pyomo - Python Optimization MOdels.
- BAMBI - BAyesian Model-Building Interface.
- combo - A Python Toolbox for Machine Learning Model Combination.
- fastai - The fast.ai deep learning library, lessons, and tutorials.
- pycaret - Low-code machine learning library in Python.
- river - River is a Python library for online machine learning.
- NumPy - A fundamental package for scientific computing with Python.
- SciPy - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
- Pandas - A library providing high-performance, easy-to-use data structures and data analysis tools.
- Numba - NumPy aware dynamic Python compiler using LLVM.
- blaze - NumPy and Pandas for databases.
- astropy - Astronomy and astrophysics.
- Biopython - Astronomy and astrophysics.
- PyDy - Multibody Dynamics.
- nilearn - NeuroImaging.
- patsy - Describing statistical models using symbolic formulas.
- numexpr - Fast numerical array expression evaluator.
- dask - Parallel computing with task scheduling.
- or-tools - Google's Operations Research tools. Classical CS algorithms.
- cvxpy - Python-embedded modeling language for convex optimization problems.
- PyOD - Versatile Python library for detecting anomalies in multivariate data.
- DeepOD - Deep learning-based outlier/anomaly detection
- Tensorflow - DL Framework.
- PyTorch - DL Framework.
- Keras - High-level neutral networks API.
- tensorlayer - A Deep Learning and Reinforcement Learning Library for Researchers and Engineers.
- mxnet - Apache MXNet: A flexible and efficient library for deep learning.
- TorchDrift - TorchDrift is a data and concept drift library for PyTorch.
- Edward - Probabilistic programming language in TensorFlow.
- pomegranate - Probabilistic modelling.
- skorch - Scikit-learn PyTorch.
- DLTK - Deep Learning Toolkit for Medical Image Analysis.
- sonnet - TensorFlow-based neural network library.
- rasa_core - Dialogue engine.
- luminoth - Computer Vision.
- allennlp - NLP Research library.
- spotlight - Pytorch Recommender framework.
- tensorforce - TensorFlow library for applied reinforcement learning.
- tensorboard-pytorch - Tensorboard for pytorch.
- keras-vis - Neural network visualization toolkit for keras.
- hyperas - Keras + Hyperopt.
- spaCy - Natural Language processing.
- tensorboard_logger - Log TensorBoard events without touching TensorFlow.
- foolbox - Python toolbox to create adversarial examples that fool neural networks.
- pytorch/vision - Datasets, Transforms and Models specific to Computer Vision.
- gluon-nlp - NLP made easy.
- pytorch/ignite - High-level library to help with training neural networks in PyTorch.
- Netron - Visualizer for deep learning and machine learning models.
- gpytorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch.
- tensorly - Tensor Learning in Python.
- einops - Deep learning operations reinvented.
- hiddenlayer - Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.
- segmentation_models.pytorch - Segmentation models with pretrained backbones.
- pytorch-lightning - The lightweight PyTorch wrapper.
- lightly - Lightly is a computer vision framework for self-supervised learning.
- fairseq - Sequence-to-Sequence Toolkit.
- tensorflow-wavenet - DeepMind's WaveNet.
- DeepRecommender - Recommender systems.
- DrQA - Reading Wikipedia to Answer Open-Domain Questions.
- vqa.pytorch - Visual Question Answering in Pytorch.
- Half-Life Regression - Model for spaced repetition practice.
- learning-to-learn - Learning to Learn in Tensorflow.
- capsule-networks - A PyTorch implementation of the NIPS 2017 paper "Dynamic Routing Between Capsules".
- Mask_RCNN - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow.
- lightnet - Bringing pjreddie's DarkNet out of the shadows.
- pytorch-openai-transformer-lm - OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI.
- maskrcnn-benchmark - Fast, modular reference implementation of Semantic Segmentation and Object Detection algorithm in PyTorch.
- LovaszSoftmax - Lovász-Softmax loss.
- ludwing - Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
- Great Tables - Absolutely Delightful Table-making in Python.
- PyGWalker - Turns pandas and polars dataframes into a Tableau-like user interface for visual exploration.
- diagrams - Diagrams lets you draw the cloud system architecture in Python code.
- matplotlib - 2D plotting.
- seaborn - Visualization library.
- bokeh - Interactive web plotting.
- plotly - Collaborative web plotting.
- dash - Interactive Web plotting.
- altair - Declarative statistical visualization.
- folium - Leaflet.js Maps.
- geoplot - High-level geospatial data visualization.
- datashader - Graphics pipeline system.
- mplleaftlet - Matplotlib plots from Python into interactive Leaflet web maps.
- matplotlib-venn - Area-weighted venn-diagrams.
- pyLDAvis - Interactive topic model visualization.
- cufflinks - Productivity Tools for Plotly + Pandas.
- scatterText - Visualizations of how language differs among document types.
- plotnine - ggplot for python.
- mizani - scales package.
- bqplot - Plotting library for IPython/Jupyter Notebooks.
- PtitPrince - Raindrop cloud.
- joypy - Ridgeline plots.
- dtreeviz - Decision tree visualization and model interpretation.
- ipyvolume - 3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL.
- Nevergrad - Gradient-free optimization.
- featuretools - Automated feature engineering.
- auto-sklearn - Automated machine learning.
- tpot - Automated machine learning.
- auto_ml - Automated machine learning.
- MLBox - Automated Machine Learning python library.
- devol - Automated deep neural network design via genetic programming.
- skll - SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
- autokeras - Automated machine learning in Keras.
- SMAC3 - Sequential Model-based Algorithm Configuration.
- mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.
- yellowbrick - Visual analysis and diagnostic tools.
- pandas-profiling - Profiling reports for pandas DataFrame objects.
- Skater - Model Agnostic Interpretation.
- Dora - Exploratory data analysis.
- sklearn-evaluation - scikit-learn model evaluation.
- fitter - simple class to identify the distribution from which a data samples is generated from.
- missingno - Missing data visualization.
- hypertools - Gaining geometric insights into high-dimensional data.
- scikit-plot - Plotting functionality to scikit-learn objects.
- elih - Explain Machine Learning.
- kmeans_smote - Oversampling for imbalanced learning based on k-means and SMOTE.
- pyUpSet - UpSet suite of visualisation methods.
- lime - Explaining the predictions of any machine learning classifier.
- pandas-summary - An extension to pandas dataframes describe function.
- SauceCat/PDPbox - Partial dependence plot toolbox.
- shap - A unified approach to explain the output of any machine learning model.
- eli5 - Debug machine learning classifiers and explain their predictions.
- rfpimp - Permutation and drop-column importance for scikit-learn random forests.
- pypeln - Concurrent data pipelines made easy.
- pycm - Multi-class confusion matrix library in Python.
- great_expectations - Always know what to expect from your data.
- alibi - Algorithms for monitoring and explaining machine learning models.
- InterpretML - Fit interpretable models. Explain blackbox machine learning.
- cleanlab - Finding label errors in datasets and learning with noisy labels.
- dtale - Flask/React client for visualizing pandas data structures
- dabl - Data Analysis Baseline Library
- XAI - XAI - An eXplainability toolbox for machine learning
- explainerdashboard - This package makes it convenient to quickly deploy a dashboard web app that explains the workings of a (scikit-learn compatible) machine learning model.
- alibi-detect - Open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series.
- sklearn-pandas - Pandas integration with sklearn.
- pdpipe - Easy pipelines for pandas DataFrames.
- engarde - Defensive data analysis.
- datacleaner - Tool that automatically cleans data sets and readies them for analysis.
- categorical-encoding - sklearn compatible categorical variable encoders.
- fancyimpute - Multivariate imputation and matrix completion algorithms.
- raccoon - DataFrame with fast insert and appends.
- kmodes - k-modes and k-prototypes clustering algorithm.
- annoy - Approximate Nearest Neighbors.
- datacleaner - Automatically cleans data sets and readies them for analysis.
- scikit-feature - Filter methods for feature selection.
- mifs - Parallelized Mutual Information based Feature Selection module.
- skggm - Scikit-learn compatible estimation of general graphical models.
- dirty_cat - Encoding methods for dirty categorical variables.
- Impyute - Data imputations library to preprocess datasets with missing data.
- eif - Extended Isolation Forest for Anomaly Detection.
- featexp - Feature exploration for supervised learning.
- feature_engine - Feature engineering package with sklearn like functionality.
- stumpy - STUMPY is a powerful and scalable Python library that can be used for a variety of time series data mining tasks.
- n2 - Lightweight approximate Nearest Neighbor library which runs faster even with large datasets.
- compressio - Compressio provides lossless in-memory compression of pandas DataFrames and Series.
- Merlion - A Machine Learning Library for Time Series
- Darts - darts is a Python library for easy manipulation and forecasting of time series.
- GrayKite - Greykite: A flexible, intuitive and fast forecasting library
- Causality - Causal analysis.
- traces - Unevenly-spaced time series analysis.
- PyFlux - Time series library for Python.
- prophet - Tool for producing high quality forecasts.
- tsfresh - Automatic extraction of relevant features from time series.
- tslearn - Machine learning toolkit dedicated to time-series data.
- pyts - A Python package for time series transformation and classification.
- sktime - A scikit-learn compatible Python toolbox for learning with time series data.
- stumpy - Matrix profiles.
- luminaire - ML driven solutions for monitoring time series data.
- NeuralProphet - A Neural Network based Time-Series model, inspired by Facebook Prophet and AR-Net, built on PyTorch.
- python_speech_features - Speech features.
- speechpy - A Library for Speech Processing and Recognition.
- magenta - Music and Art Generation with Machine Intelligence.
- librosa - Audio and music analysis.
- pydub - Manipulate audio with a simple and easy high level interface.
- pytorch/audio - simple audio I/O for pytorch.
- pillow - PIL fork.
- scikit-image - Image processing.
- hmap - Image histogram remapping.
- pyocr - A wrapper for Tesseract and Cuneiform (Optical Character Recognition).
- scikit-video - Video processing.
- moviepy - Video editing.
- OpenCV - Open Source Computer Vision Library.
- SimpleCV - Wrapper around OpenCV.
- label-maker - Data Preparation for Satellite Machine Learning.
- face_recognition - Facial recognition.
- imgaug - Image augmentation.
- pyvips - Fast image processing.
- ImageHash - Image hashing.
- Augmentor - Image augmentation library.
- PyAV - Bindings for FFmpeg.
- imutils - Convenience functions to make basic image processing operations.
- albumentations - fast image augmentation library.
- geojson - Python bindings for GeoJSON.
- geopy - Python Geocoding Toolbox.
- OSMnx - Street networks.
- reverse-geocoder - A fast, offline reverse geocoder.
- pysal - Spatial Analysis Library.
- geopandas - Tools for geographic data.
- wordfreq - Library for looking up the frequencies of words in many languages, based on many sources of data.
- BlingFire - A lightning fast Finite State machine and REgular expression manipulation library.
- BERT-pytorch - Google AI 2018 BERT pytorch implementation.
- pytorch-pretrained-BERT - PyTorch version of Google AI's BERT model with script to load Google's pre-trained models.
- gensim - Topic Modeling.
- pattern - Web ining module.
- probablepeople - Parsing unstructured western names into name components.
- Expynent - Regular expression patterns.
- mimesis - Generate synthetic data.
- pyenchant - Spell checking.
- parserator - Domain-specific probabilistic parsers.
- scrubadub - Clean personally identifiable information from dirty dirty text.
- usaddress - Parsing unstructured address strings into address components.
- python-phonenumbers - Python port of Google's libphonenumber.
- jellyfish - Approximate and phonetic matching of strings.
- preprocessing - Simple interface for the CMU Pronouncing Dictionary.
- langid - Stand-alone language identification system.
- fuzzywuzzy - Fuzzy String Matching.
- Fuzzy - Soundex, NYSIIS, Double Metaphone.
- snowball - Snowball compiler and stemming algorithms.
- leven - Levenshtein edit distance.
- flashtext - Extract Keywords from sentence or Replace keywords in sentences.
- polyglot - Multilingual text NLP processing toolkit.
- sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.
- pyfasttext - Binding for fastText.
- python-wordsegment - English word segmentation.
- pyahocorasick - Exact or approximate multi-pattern string search.
- Wordbatch - Parallel text feature extraction for machine learning.
- langdetect - Port of Google's language-detection library.
- translation - Uses web services for text translation.
- nltk - Natural Language Toolkit.
- unidecode - ASCII transliterations of Unicode text.
- pytorch/text - Data loaders and abstractions for text and NLP.
- textdistance - Compute distance between sequences.
- sent2vec - General purpose unsupervised sentence representations.
- pyhunspell - Python bindings for the Hunspell spellchecker engine.
- facebook/fastText - Library for fast text representation and classification.
- textblob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
- facebook/InferSent - Sentence embeddings (InferSent) and training code for NLI.
- nmslib - Non-Metric Space Library.
- google/sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.
- ftfy - Fixes mojibake and other glitches in Unicode text, after the fact.
- fletcher - Pandas ExtensionDType/Array backed by Apache Arrow.
- textacy - NLP, before and after spaCy.
- hmtl - Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP.
- pytext - A natural language modeling framework based on PyTorch.
- flair - A very simple framework for state-of-the-art Natural Language Processing.
- LASER - Language-Agnostic SEntence Representations.
- transformer-xl - Attentive Language Models Beyond a Fixed-Length Context.
- textstat - Calculate readability statistics of a text object - paragraphs, sentences, articles.
- nlpaug - Augmenting nlp for your machine learning projects.
- sum - Automatic summarization of text documents and HTML.
- textract - Extract text from any document.
- newspaper - News extraction, article extraction and content curation.
- recommenders - Examples and best practices for building recommendation systems
- Surprise - Analyzing recommender systems.
- trueskill - TrueSkill rating system.
- LightFM - Hybrid recommendation algorithm.
- implicit - Collaborative Filtering for Implicit Datasets.
- Clairvoyant - Identify and monitor social/historical cues.
- zipline - Algorithmic Trading Library.
- qstrader - Advanced Trading Infrastructure.
- mmh3 - MurmurHash3, a set of fast and robust hash functions.
- fbpca - Fast Randomized PCA/SVD.
- annoy - Approximate Nearest Neighbors.
- pipeline - Standard Runtime For Every Real-Time Machine Learning.
- crayon - A language-agnostic interface to TensorBoard.
- faiss - A library for efficient similarity search and clustering of dense vectors.
- pyod - Comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data.
- evidently - Evidently helps evaluate machine learning models during validation and monitor them in production.
- onnx - Open Neutral Network Exchange.
- lore - Lore makes machine learning approachable for Software Engineers and maintainable for Machine Learning Researchers.
- kubeflow - Machine Learning Toolkit for Kubernetes.
- airflow - ETL.
- mlflow - Open source platform for the complete machine learning lifecycle.
- sklearn-porter - Transpile trained scikit-learn estimators.
- sklearn-compiledtrees - Compiled Decision Trees for scikit-learn.
- mem_usage_ui - Measuring and graphing memory usage of local processes.
- viztracer - VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.
- py-spy - Sampling profiler for Python programs.
- memory_profiler - monitoring memory usage of a python program.
- line_profiler - Line-by-line profiling.
- filprofiler - Fil a memory profiler designed for data processing applications.
- scalene - High-performance CPU and memory profiler for Python.
- python-flamegraph - Statistical profiler which outputs in format suitable for FlameGraph.
- Typer - Build CLIs with type hints.
- hydra - Framework for elegantly configuring complex applications.
- neurtu - A Python package for parametric benchmarks.
- pyprojroot - Finding project directories in Python.
- datasette - An open source multi-tool for exploring and publishing data.
- delorean - Time Travel Made Easy.
- pip-tools - Keeps dependencies up to date.
- devpi - PyPI server and packaging/testing/release tool.
- Jupyter Notebook - Notebooks are awseome.
- click - CLI package.
- sacredboard - Dashboard for sacred.
- sacred - Reproduce computational experiments.
- magic-wormhole - get things from one computer to another, safely.
- gain - Web crawling framework based on asyncio.
- MechanicalSoup - A Python library for automating interaction with websites.
- camelot - Camelot: PDF Table Extraction for Humans.
- Pandarallel - Parallel pandas.
- great_expectations - F framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests.
- parse - Parse strings using a specification based on the Python format() syntax.
- CleverCSV - CleverCSV is a Python package for handling messy CSV files