Calculation Optimization
Tool | Description |
---|---|
https://github.com/rapidsai/cudf | cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. |
https://github.com/rapidsai/cuml | cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. |
https://github.com/cupy/cupy | NumPy-like API accelerated with CUDA |
https://github.com/modin-project/modin | Modin: Speed up your Pandas workflows by changing a single line of code |
https://github.com/numba/numba | A Just-In-Time Compiler for Numerical Functions in Python |
https://github.com/weld-project/weld | High-performance runtime for data analytics applications |
Click Through Rate Prediction
Tool | Description |
---|---|
https://github.com/shenweichen/DeepCTR | Easy-to-use,Modular and Extendible package of deep-learning based CTR models. |
https://github.com/aksnzhy/xlearn | xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM), all of which can be used to solve large-scale machine learning problems |
Computer Vision
Tool | Description |
---|---|
https://github.com/kornia/kornia/ | Kornia is a differentiable computer vision library for PyTorch. |
https://github.com/opencv/opencv | Open Source Computer Vision Library |
https://github.com/madmaze/pytesseract | A Python wrapper for Google Tesseract OCR Engine |
https://github.com/sirfz/tesserocr | A simple, Pillow-friendly, wrapper around the tesseract-ocr API for Optical Character Recognition (OCR). |
https://github.com/sightmachine/SimpleCV | SimpleCV is a framework for Open Source Machine Vision, using OpenCV and the Python programming language. |
Explainable AI
Tool | Description |
---|---|
https://github.com/pytorch/captum | Captum is a model interpretability and understanding library for PyTorch. |
https://github.com/yosinski/deep-visualization-toolbox | Deep Visualization toolbox is an open source software tool that lets you probe DNNs by feeding them an image (or a live webcam feed) and watching the reaction of every neuron. |
https://github.com/TeamHG-Memex/eli5 | ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions. |
https://github.com/IBM/AIX360/ | The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models. |
https://github.com/IBM/AIF360 | The AI Fairness 360 toolkit is an extensible open-source library containg techniques developed by the research community to help detect and mitigate bias in machine learning models throughout the AI application lifecycle. |
https://github.com/albermax/innvestigate | This tool provides a common interface and out-of-the-box implementation for many analysis methods. |
https://github.com/raghakot/keras-vis | keras-vis is a high-level toolkit for visualizing and debugging your trained keras neural net models. |
https://github.com/marcotcr/lime | This project is about explaining what machine learning classifiers (or models) are doing and currently support explaining individual predictions for text classifiers or classifiers that act on |
tables (numpy arrays of numerical or categorical data) or images | |
https://github.com/interpretml/interpret | InterpretML is an open-source python package for training interpretable machine learning models and explaining blackbox systems. |
https://github.com/mindsdb/mindsdb | MindsDB is an Explainable AutoML framework for developers built on top of Pytorch that enables you to build, train and test state of the art ML models in as simple as one line of code. |
https://github.com/slundberg/shap | SHAP is a game theoretic approach to explain the output of any machine learning model |
https://github.com/tensorflow/cleverhans | An adversarial example library for constructing attacks, building defenses, and benchmarking both |
https://github.com/tensorflow/lucid | A collection of infrastructure and tools for research in neural network interpretability. |
https://github.com/tensorflow/model-analysis | TensorFlow Model Analysis (TFMA) is a library for evaluating TensorFlow models that allows users to evaluate their models on large amounts of data in a distributed manner, using the same metrics defined in their trainer. |
https://github.com/andosa/treeinterpreter | Package for interpreting scikit-learn's decision tree and random forest predictions. |
Feature Engineering & Auto ML
Tool | Description |
---|---|
https://github.com/blue-yonder/tsfresh | Automatic extraction of relevant features from time series |
https://epistasislab.github.io/tpot/ | TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. |
https://github.com/rsteca/sklearn-deap | It uses evolutionary algorithms instead of gridsearch in scikit-learn. |
https://github.com/minimaxir/automl-gs | Provide an input CSV and a target field to predict, generate a model + code to run it. |
https://automl.github.io/auto-sklearn/master/ | auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning as it leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction. |
Model Deployment
Tool | Description |
---|---|
https://github.com/ucbrise/clipper | A low-latency prediction-serving system |
https://github.com/kubeflow/kubeflow | Machine Learning Toolkit for Kubernetes |
https://github.com/combust/mleap | MLeap allows data scientists and engineers to deploy machine learning pipelines from Spark and Scikit-learn to a portable format and execution engine. |
https://github.com/Microsoft/pai | OpenPAI is an open source platform that provides complete AI model training and resource management capabilities |
https://github.com/SeldonIO/seldon-core | A framework to deploy, manage and scale your production machine learning to thousands of models |
https://github.com/tensorflow/serving | A flexible, high-performance serving system for machine learning models |
https://github.com/jolibrain/deepdetect | Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE |
Model and Data Versioning
Tool | Description |
---|---|
https://github.com/catalyst-team/catalyst | PyTorch framework for Deep Learning research and development which was developed with a focus on reproducibility, fast experimentation and code/ideas reusing. |
https://github.com/d6t/d6tflow | d6tflow is a python library which makes building complex data science workflows easy, fast and intuitive. |
https://github.com/iterative/dvc | Data Version Control |
https://github.com/quantumblacklabs/kedro/ | A Python library that implements software engineering best-practice for data and ML pipelines. |
https://github.com/mlflow/mlflow | MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. |
https://github.com/VertaAI/modeldb/ | ModelDB is an open-source system to version machine learning models including their ingredients code, data, config, and environment and to track ML metadata across the model lifecycle. |
https://github.com/pachyderm/pachyderm | Pachyderm: Data Versioning, Data Pipelines, and Data Lineage |
https://github.com/polyaxon/polyaxon | A platform for reproducible and scalable machine learning and deep learning on kubernetes |
https://github.com/IDSIA/sacred | Sacred is a tool to help you configure, organize, log and reproduce experiments |
https://github.com/allegroai/trains | TRAINS - Auto-Magical Experiment Manager & Version Control for AI - NOW WITH AUTO-MAGICAL DEVOPS! |
Natural Language Processing
Tool | Description |
---|---|
https://www.nltk.org/ | NLTK is a leading platform for building Python programs to work with human language data. |
https://github.com/clips/pattern | Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization. |
https://github.com/machinalis/quepy | A python framework to transform natural language questions to queries in a database query language. |
https://github.com/sloria/TextBlob/ | It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. |
https://github.com/machinalis/yalign | Yalign is a tool for extracting parallel sentences from comparable corpora. |
https://github.com/columbia-applied-data-science/rosetta | Tools for data science with a focus on text processing. |
https://github.com/proycon/pynlpl | It can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. |
https://github.com/sergioburdisso/pyss3 | Python package that implements a novel text classifier (SS3) with visualizations tools for Explainable Artificial Intelligence (XAI) |
https://github.com/explosion/spaCy | spaCy is a library for advanced Natural Language Processing built on the very latest research, and was designed from day one to be used in real products. |
https://github.com/seatgeek/fuzzywuzzy | Fuzzy is a string matching tool that uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package. |
https://github.com/jamesturk/jellyfish | Jellyfish is a python library for doing approximate and phonetic matching of strings. |
https://github.com/chartbeat-labs/textacy | textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. |
https://github.com/aflc/editdistance | Fast implementation of the edit distance (Levenshtein distance). |
https://github.com/dasmith/stanford-corenlp-python | Python wrapper for Stanford University's NLP group's Java-based CoreNLP tools. |
https://github.com/cltk/cltk | The Classical Language Toolkit (CLTK) offers natural language processing (NLP) support for the languages of Ancient, Classical, and Medieval Eurasia. |
https://github.com/RasaHQ/rasa | Rasa is an open source machine learning framework to automate text-and voice-based conversations. |
https://github.com/aboSamoor/polyglot | Polyglot is a natural language pipeline that supports massive multilingual applications. |
https://github.com/facebookresearch/DrQA | DrQA is a system for reading comprehension applied to open-domain question answering which is targeted at the task of "machine reading at scale" (MRS). |
https://github.com/dedupeio/dedupe | A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution. |
https://github.com/snipsco/snips-nlu | It is a Python library that allows to extract structured information from sentences written in natural language. |
https://github.com/Franck-Dernoncourt/NeuroNER | NeuroNER is a program that performs named-entity recognition (NER). |
https://github.com/deepmipt/DeepPavlov/ | DeepPavlov is an open-source conversational AI library built on TensorFlow and Keras, designed for the development of production ready chat-bots and complex conversational systems and and support research in the area of NLP and, particularly, of dialog systems. |
https://github.com/bigartm/bigartm | The state-of-the-art platform for topic modeling. |
https://github.com/EducationalTestingService/python-zpar | python-zpar is a python wrapper around the ZPar parser which is a statistical natural language parser, which performs syntactic analysis tasks including word segmentation, part-of-speech tagging and parsing. |
https://github.com/salesforce/ctrl | CTRL, a 1.6 billion-parameter conditional transformer language model, trained to condition on control codes that specify domain, subdomain, entities, relationships between entities, dates, and task-specific behavior. |
https://github.com/facebookresearch/XLM | PyTorch original implementation of Cross-lingual Language Model Pretraining. |
https://github.com/flairNLP/flair | A very simple framework for state-of-the-art Natural Language Processing (NLP) |
https://github.com/github/semantic | semantic is a Haskell library and command line tool for parsing, analyzing, and comparing source code. |
https://github.com/dmlc/gluon-nlp | GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research. |
https://github.com/gnes-ai/gnes | GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network. |
https://github.com/rowanz/grover | Grover is a model for Neural Fake News -- both generation and detection. |
https://github.com/BrikerMan/Kashgari | Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding. |
https://github.com/explosion/sense2vec | sense2vec is a nice twist on word2vec that lets you learn more interesting and detailed word vectors. |
https://github.com/snorkel-team/snorkel | A system for quickly generating training data with weak supervision |
https://github.com/tensorflow/lingvo | Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models. |
https://github.com/vkcom/youtokentome | Unsupervised text tokenizer focused on computational efficiency |
https://github.com/huggingface/transformers | It provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, CTRL...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch. |
https://github.com/facebookresearch/wav2letter | wav2letter++ is a fast, open source speech processing toolkit from the Speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition |
Recommendation System
Tool | Description |
---|---|
https://github.com/maciejkula/spotlight | Spotlight uses PyTorch to build both deep and shallow recommender models. |
https://github.com/cheungdaven/DeepRec | An Open-source Toolkit for Deep Learning based Recommendation with Tensorflow. |
https://github.com/NicolasHug/Surprise | A Python scikit for building and analyzing recommender systems |
https://github.com/lyst/lightfm | A Python implementation of LightFM, a hybrid recommendation algorithm. |
https://github.com/ocelma/python-recsys | A python library for implementing a recommender system |
https://github.com/jfkirk/tensorrec | TensorRec is a Python recommendation system that allows you to quickly develop recommendation algorithms and customize them using TensorFlow. |
https://github.com/caserec/CaseRecommender | Case Recommender is a Python implementation of a number of popular recommendation algorithms for both implicit and explicit feedback. |
https://github.com/benfred/implicit | Fast Python Collaborative Filtering for Implicit Feedback Datasets |
https://github.com/ibayer/fastFM | fastFM: A Library for Factorization Machines |
Reinforcement Learning
Tool | Description |
---|---|
https://github.com/deepmind/lab | DeepMind Lab provides a suite of challenging 3D navigation and puzzle-solving tasks for learning agents and is generally used as a testbed for research in artificial intelligence, especially deep reinforcement learning. |
https://github.com/openai/gym | A toolkit for developing and comparing reinforcement learning algorithms. |
https://github.com/openai/retro | Gym Retro lets you turn classic video games into Gym environments for reinforcement learning and comes with integrations for ~1000 games. |
https://github.com/NervanaSystems/coach | Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. |
https://github.com/rlworkgroup/garage | garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementations built using that toolkit. |
https://github.com/rlworkgroup/metaworld | Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. |
Security & Privacy
Tool | Description |
---|---|
https://github.com/OpenMined/PySyft | PySyft is a Python library for secure and private Deep Learning. |
https://github.com/SubstraFoundation/substra | Substra is a framework for traceable ML orchestration on decentralized sensitive data. |
https://github.com/tensorflow/privacy | It is a Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. |
https://github.com/tf-encrypted/tf-encrypted | TF Encrypted is a framework for encrypted machine learning in TensorFlow that aims to make privacy-preserving machine learning readily available, without requiring expertise in cryptography, distributed systems, or high performance computing. |
Visualization
Tool | Description |
---|---|
https://github.com/bokeh/bokeh | Interactive Data Visualization in the browser, from Python |
https://github.com/andrea-cuttone/geoplotlib | geoplotlib is a python toolbox for visualizing geographical data and making maps |
https://github.com/ResidentMario/missingno | Missing data visualization module for Python. |
https://github.com/finos/perspective | Perspective is an interactive visualization component for large, real-time datasets. |
https://github.com/plotly/dash | Analytical Web Apps for Python, R, and Julia. |
https://github.com/Kozea/pygal | PYthon svg GrAph plotting Library |
https://github.com/mwaskom/seaborn | Statistical data visualization using matplotlib |
https://github.com/streamlit/streamlit | Streamlit — The fastest way to build custom ML tools |
https://github.com/DistrictDataLabs/yellowbrick | Visual analysis and diagnostic tools to facilitate machine learning model selection. |