/Awesome-ML-Toolbox

Curated list of tools and technologies

MIT LicenseMIT

My Ultimate Tech Toolbox

Here’s a curated list of the tools and resources that support my tech journey. From development to testing and database management, these are the essentials I rely on to get the job done efficiently. 🙌 Dive in and explore the tools that enhance my workflow.

01. Visual Studio Code

  • anaconda-extension-pack: Set of extensions that enhance the experience of Anaconda customers using Visual Studio Code
  • AREPL for python: Real-time python scratchpad
  • autocomplete-shell: Autocompletion for bash script in vscode
  • autodocstring: Quickly generate docstrings for python functions
  • Beanie: Asynchronous Python object-document mapper (ODM) for MongoDB. Data models are based on Pydantic.
  • better-comments: Color code comments based on TODO/Alert/Warning etc.
  • bookmarks: Bookmarks lines in code and jump to them
  • code-spell-checker: Catch common spelling errors in codebase
  • docker: Adds syntax highlighting, commands, hover tips, and linting for Dockerfile and docker-compose files.
  • gc-excelviewer: View excel and CSV files inside Vscode
  • git history diff: View git history. View diff of committed files. View git blame info. View stash details.
  • gitblame: See git blame information in the status bar.
  • githistory: View git log, file history, compare branches or commits
  • gitignore: Auto create .gitignore files for various languages
  • gitlens: Supercharge the Git capabilities built into Visual Studio Code
  • IntelliCode: Provides AI-assisted development features for Python, TypeScript/JavaScript and Java developers in Visual Studio Code, with insights based on understanding your code context combined with machine learning.
  • guides: Guides is simply an extension that add various indentation guide lines
  • Kite: AI code completions for all languages, intellisense, code snippets, code signatures, and cursor-following documentation for VS Code
  • live server: Launch a development local Server with live reload feature for static & dynamic pages
  • LaTeX Workshop: Boost LaTeX typesetting efficiency with preview, compile, autocomplete, colorize, and more.
  • local-history: A visual source code plugin for maintaining local history of files.
  • Marp: Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
  • Material Icon Theme: Material Design Icons for Visual Studio Code
  • mssql: Visual Studio Code SQL Server extension.
  • output-colorizer: Syntax highlighting for log files
  • path-autocomplete: Provides path completion for visual studio code.
  • path-intellisense: Visual Studio Code plugin that autocompletes filenames
  • pdf: Display pdf file in VSCode
  • prettier: Code formattter for various languages
  • Pylance: A performant, feature-rich language server for Python in VS Code
  • python: Linting, Debugging (multi-threaded, remote), Intellisense, code formatting, refactoring, unit tests, snippets, and more.
  • python-extended-snippets: Python Extended is a vscode snippet that makes it easy to write codes in python by providing completion options along with all arguments.
  • python-extension-pack: All in one package of popular Visual Studio Code extensions for Python
  • rainbow-brackets: Provide rainbow colors for the round brackets, the square brackets and the squiggly brackets.
  • remote-ssh: Open any folder on a remote machine using SSH and take advantage of VS Code's full feature set.
  • REST Client: REST Client allows you to send HTTP request and view the response in Visual Studio Code directly
  • rewrap: Re-wraps comments and other text to a given line length.
  • settings sync: Sync settings of vscode using github gists
  • shell-format: Code support for shellscript、Dockerfile、properties、gitignore、dotenv、hosts、jvmoptions... DocumentFormat
  • tabnine-vscode: All-language autocompleter — TabNine uses machine learning to help you write code faster.
  • theme-dracula: Dark theme for vscode
  • vim: Vim bindings support in vscode
  • vscode-django: Beautiful syntax and scoped snippets for Django
  • vscode-icons: Change file icons in vscode
  • vscode-markdownlint: Markdown linting and style checking for Visual Studio Code
  • vscode-pull-request-github: Review and manage your GitHub pull requests directly in VS Code
  • vscodeintellicode: AI-assisted development features for Python, TypeScript/JavaScript and Java developers in Visual Studio Code, with insights based on understanding your code context combined with machine learning.
  • File Utils: A convenient way of creating, duplicating, moving, renaming and deleting files and directories.
  • Advanced New File: Create files anywhere in your workspace from the keyboard
  • CodeSnap: Take beautiful screenshots of your code
  • JSON Crack: Seamlessly visualize your JSON data instantly into graphs.

02. Vim / NeoVim

  • nerdtree: Project and file navigation
  • tagbar: An easy way to browse the class and modules
  • vim-surround: matches parentheses, brackets, quotes, XML tags, and more
  • vim-commentary: Comment/uncomment lines according to filetype
  • python-mode: Python support in vim
  • vim-fzf: fuzzy finder
  • YouCompleteMe: Fast, as-you-type, fuzzy-search code completion engine
  • ctrl-space: tabs,buffers,files management and fast fuzzy searching
  • syntastic: Syntax checking plugin

03. Python Environment

04. IDE

05. Web Frameworks

06. Testing Framework

07. Database Managemeny System (DBMS)

08. Profiling Tools

09. API/Testing Platform

10. Drawing Tools

11. 3D Avatar

  • Omniverse Audio2Face: Instantly create expressive facial animation from just an audio source using generative AI.
  • MetaHuman: Complete framework that gives any creator the power to use highly realistic human characters in any way imaginable.

12. Vector Database

13. Graph DB

14. MLops

15. Web Application

16. Server

  • Slurm: Managing and scheduling Linux clusters.

17. Terminal

  • Tmux: Open-source terminal multiplexer for Unix-like operating systems
  • bash: control OS without having to navigate menus, options, and windows within a GUI
  • ZSH:Unix shell that is built on top of bash
  • Emacs: An extensible, customizable, free/libre text editor

18. Paper Reading

  • Mendely: Reference manager and academic social network that can help you organize your research.
  • Zotero: Zotero is a free, easy-to-use tool to help you collect, organize, cite, and share research.

19. Linux

  • Symlink: Points to another file or folder on your computer, or a connected file system
  • dotfiles: Control the settings and preferences for applications and your system environment
  • GNU Stow: Symlink farm manager which takes distinct packages of software and/or data located in separate directories on the filesystem, and makes them appear to be installed in the same place.
  • vagrant: Tool for building complete development environments
  • plenary

20. Obsidian

21. Nvim

22. Python Libraries

  • accelerate: A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
  • Activemq: ActiveMQ is most commonly deployed as a standalone process
  • apscheduler: Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python code to be executed later, either just once or periodically
  • arize-phoenix: ML Observability in a Notebook - Uncover Insights, Surface Problems, Monitor, and Fine Tune your Generative LLM, CV and Tabular Models
  • argparse: Write user-friendly command-line interfaces
  • Beanie: Asynchronous Python object-document mapper (ODM) for MongoDB
  • beautifulsoup: Pull data out of HTML and XML files
  • bert-as-a-service: Generate BERT Embeddings for production
  • black: Opiniated code formatter for python code
  • BLOOM: The World’s Largest Open Multilingual Language Model
  • bokeh: Bokeh is a Python library for creating interactive visualizations for modern web browsers
  • boto/boto3: Control AWS service with pure python code
  • camelot: Extract tables from PDF files
  • Celery: Task queues are used as a mechanism to distribute work across threads or machines.
  • collections: Specialized container datatypes
  • conda: Package, dependency and environment management
  • concurrent.futures:Launching parallel tasks
  • chime: Python sound notifications made easy.
  • dabl: Learning comes from comparing finished products and picking the better one
  • dask: Scale the Python tools you love
  • datetime: Supplies classes for manipulating dates and times
  • deepctr: Deep-learning based CTR models
  • deepspeed: Deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
  • deep-translatot: A flexible FREE and UNLIMITED tool to translate between different languages in a simple way using multiple translators.
  • django: High-level Python Web framework
  • djongo: Django and MongoDB database connector
  • dlib: A toolkit for making real world machine learning and data analysis applications in C++
  • docx2txt: A pure python-based utility to extract text and images from docx files
  • DPO Trainer: TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization
  • DSPy: The framework for programming—not prompting—foundation models
  • Dynaconf: a dynamic configuration for Python applications
  • einops: Flexible and powerful tensor operations for readable and reliable code.
  • Embedding projector:offers three commonly used methods of data dimensionality reduction, which allow easier visualization of complex data: PCA, t-SNE and custom linear projections
  • factscore: automatic evaluation metric for factual precision in long-form text generation
  • FAISS:A library for efficient similarity search and clustering of dense vectors
  • fastai: fastai makes deep learning with PyTorch faster, more accurate, and easier
  • fastapi: FastAPI framework, high performance, easy to learn, fast to code, ready for production
  • fasttext: Library for efficient text classification and representation learning
  • faster-whisper: Reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.
  • finetune: Scikit-learn style model finetuning for NLP
  • flash-attn: Flash Attention: Fast and Memory-Efficient Exact Attention
  • flask: Lightweight WSGI web application framework
  • flask-restplus: Fully featured framework for fast, easy and documented API development with Flask
  • Flower: Real-time monitoring using Celery Events
  • fairscale: PyTorch extensions for high performance and large scale training.
  • fugue: SQL for Pandas, Spark, and Dask DataFrames
  • gdal: GDAL: Geospatial Data Abstraction Library
  • gensim: Topic modelling, document indexing and similarity retrieval with large corpora.
  • gpt-index: Central interface to connect your LLM’s with external data.
  • gpt3-simple-primer : Simple GPT-3 primer using openai.
  • gspread: Python library to interact with Google Sheets
  • gunicorn: Production web server for Flask, Django apps
  • h2oGPT: The world's best open source GPT
  • hugging face: Build, train and deploy state of the art models powered by the reference open source in machine learning
  • Haystack: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data.
  • hungabunga: HungaBunga: Brute-Force all sklearn models with all parameters using .fit .predict!
  • hydra: Hydra is an open-source Python framework that simplifies the development of research and other complex applications.
  • implicit: Fast Python Collaborative Filtering for Implicit Feedback Datasets
  • interpret: Fit interpretable models. Explain blackbox machine learning.
  • ipython: IPython: Productive Interactive Computing
  • itertools: Functions creating iterators for efficient looping
  • json: Read and write JSON files
  • jupyter: Jupyter notebooks
  • jupyterlab: An extensible environment for interactive and reproducible computing, based on the Jupyter Notebook and Architecture
  • kedro: A Python framework for creating reproducible, maintainable and modular data science code
  • keras: High-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
  • langchain: provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications
  • LangSmith: LangSmith is a platform for building production-grade LLM applications.
  • libffm: A Library for Field-aware Factorization Machines
  • libfm: Factorization Machine Library
  • lightfm: A Python implementation of LightFM, a hybrid recommendation algorithm.
  • lime: Local Interpretable Model-Agnostic Explanations for machine learning classifiers
  • LlamaIndex: Provides a central interface to connect your LLM’s with external data.
  • loguru: Loguru is a library which aims to bring enjoyable logging in Python.
  • lora: LoRA for Efficient Stable Diffusion Fine-Tuning
  • magic-wormhole:This package provides a library and a command-line tool named wormhole, which makes it possible to get arbitrary-sized files and directories (or short pieces of text) from one computer to another
  • Manim: Animation engine for explanatory math videos
  • matchzoo: MatchZoo is a toolkit for text matching
  • matplotlib: Matplotlib strives to produce publication quality 2D graphics
  • memory-profiler: A module for monitoring memory usage of a python program
  • Modin : Provide an effortless way to speed up your pandas notebooks, scripts, and libraries.
  • mongoengine: MongoEngine is a Python Object-Document Mapper for working with MongoDB.
  • more_itertools: More routines for operating on iterables, beyond itertools
  • multiprocessing-logging: Logger for multiprocessing applications
  • mypy: optional static type checker for Python that aims to combine the benefits of dynamic
  • newspaper: Simplified python article discovery & extraction.
  • nlopt: Library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization
  • nltk: Natural Language Toolkit
  • netron: Visualizer for neural network, deep learning and machine learning models
  • numpy: NumPy is the fundamental package for array computing with Python.
  • nvitop: An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
  • openai : Provides convenient access to the OpenAI API from applications written in the Python language
  • openai-playground: Allows users to explore and experiment with OpenAI's artificial intelligence models,
  • OpenAPI: OpenAPI Specification provides a formal standard for describing HTTP APIs.
  • opencv: Wrapper package for OpenCV python bindings.
  • Optimum: 🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
  • pandasql: Allows you to query pandas DataFrames using SQL syntax
  • pandera: A Statistical Data Testing Toolkit
  • pandarallel: An easy to use library to speed up computation (by parallelizing on multi CPUs) with pandas.
  • pandas: Powerful data structures for data analysis, time series, and statistics
  • pandas-profiling - Provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution
  • Panel: The powerful data exploration & web app framework for Python
  • patsy : Describing statistical models in Python
  • pdf2image: A wrapper around the pdftoppm and pdftocairo command line tools to convert PDF to a PIL Image list.
  • PEFT : Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware
  • petals: Run 100B+ language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
  • pillow: Python Imaging Library (Fork)
  • pipenv: Pipenv is a tool that aims to bring the best of all packaging worlds (bundler, composer, npm, cargo, yarn, etc.) to the Python world.
  • pipreqs: Generate pip requirements.txt file based on imports of any project. Looking for maintainers to move this project forward.
  • plotly: An open-source, interactive graphing library for Python
  • poetry: Python packaging and dependency management made easy
  • pre-commit: A framework for managing and maintaining multi-language pre-commit hooks.
  • presidio: Context aware, pluggable and customizable data protection and de-identification SDK for text and images
  • Promptify: Prompt Engineering, Solve NLP Problems with LLM's & Easily generate different NLP Task prompts for popular generative models like GPT, PaLM, and more with Promptify
  • prophet: Microframework for analyzing financial markets.
  • pyaudio: Cross-platform audio I/O with PortAudio
  • pydash: The kitchen sink of Python utility libraries for doing "stuff" in a functional way
  • Pylint: It's not just a linter that annoys you!
  • Pipe: Write clean python Code
  • pydantic: Most widely used data validation library for Python.
  • PyMuPDF: PyMuPDF is a Python binding for MuPDF – a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit, which is maintained and developed by Artifex Software, Inc
  • pymongo: Python driver for MongoDB
  • pymysql: Pure Python MySQL Driver
  • pyod: PyOD is the most comprehensive and scalable Python library for detecting outlying objects in multivariate data.
  • pyodbc: pyodbc is an open source Python module that makes accessing ODBC databases simple
  • pypdf2: PDF toolkit
  • pyppeteer: Headless chrome/chromium automation library (unofficial port of puppeteer)
  • pyspark: Apache Spark Python API
  • pyttsx3: text-to-speech conversion library in Python
  • pytest: pytest: simple powerful testing with Python
  • python-dotenv: Add .env support to your django/flask apps in development and deployments
  • pytorch: Open source machine learning framework
  • pytorch-transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
  • pyyaml: YAML 1.1 parser
  • QLoRA: efficient Finetuning of Quantized LLMs
  • RabbitMQ: A queue in RabbitMQ is an ordered collection of messages
  • Ragas: Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
  • rasterio: Reads and writes GeoTIFF formats and provides a Python API based on N-D arrays
  • RapidAPI:To discover and connect to thousands of APIs
  • Ray: Effortlessly scale your most complex workloads
  • re: Regular expression matching operations
  • requests: HTTP library for Python
  • schedule: Python job scheduling for humans. Run Python functions (or any other callable) periodically using a friendly syntax
  • scikit-image: Collection of algorithms for image processing
  • scikit-learn: Tools for data mining and data analysis and machine learning in Python
  • scikit-surprise: Python RecommendatIon System Engine
  • scrapy: Framework for extracting the data you need from websites
  • seaborn: Data visualization library based on matplotlib.
  • selenium: Provides a simple API to write functional/acceptance tests using Selenium WebDriver
  • Sentence-Transformers: Python framework for state-of-the-art sentence, text and image embeddings
  • sentry-sdk: Sentry's Python SDK enables automatic reporting of errors and performance data in your application.
  • Supervised Fine-tuning Trainer: This class is a wrapper around the transformers.Trainer class and inherits all of its attributes and methods. The trainer takes care of properly initializing the PeftModel in case a user passes a PeftConfig object.
  • shap: Explain the output of any machine learning model
  • shutil: Offers a number of high-level operations on files and collections of files
  • sketch: understands the context of your data, greatly improving the relevance of suggestions
  • spacy: Library for advanced Natural Language Processing in Python
  • SpanMarker: SpanMarker for Named Entity Recognition
  • sqlalchemy: Python SQL toolkit
  • StableLM: StableLM: Stability AI Language Models
  • Stanford Alpaca: Alpaca: A Strong, Replicable Instruction-Following Model
  • Supervised Fine-tuning Trainer: Involves adapting a pre-trained Language Model (LLM) to a specific downstream task using labeled data
  • sympy: Python library for symbolic mathematics
  • tabulapy: Python wrapper of tabula-java, which can read table of PDF
  • taichi: Productive, portable, and performant GPU programming in Python.
  • tensorflow: Core open source library to develop and train ML models
  • Tensorflow Playground : A Neural Network Playground
  • tika: An interface that provides the facility to extract content and metadata from any type of document
  • tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models.
  • triton: Development repository for the Triton language and compiler
  • txtai: Build AI-powered semantic search applications
  • tqdm: Displays progress bar for list iterations
  • tracemalloc : Trace memory allocations
  • trafilatura: A Python package & command-line tool to gather text on the Web
  • Trainer: Provides an API for feature-complete training in PyTorch for most standard use cases
  • Trio: a friendly Python library for async concurrency and I/O
  • urllib: Collects several modules for working with URLs
  • vmap: vmap is the vectorizing map; vmap(func) returns a new function that maps func over some dimension of the inputs.
  • vectorhub: Library for easy discovery, and consumption of State-of-the-art models to turn data into vectors. (text2vec, image2vec, video2vec, graph2vec, bert, inception, etc)
  • Vertex AI: Train and deploy ML models
  • vaex: a partial Pandas replacement that uses lazy evaluation and memory mapping to allow developers to work with large datasets on standard machines
  • vllm: A high-throughput and memory-efficient inference and serving engine for LLMs
  • wandb: MLOps platform helps AI developers streamline their ML workflow from end-to-end.
  • Websocket: real-time, event-driven communication between clients and servers
  • Whisper: Automatic speech recognition model trained on 680,000 hours of multilingual data collected from the web.
  • whisper-jax: optimised implementation of the Whisper model by OpenAI
  • xgboost: Distributed gradient boosting library
  • xlearn: High performance, easy-to-use, and scalable machine learning package
  • xlrd: Extract data from Excel spreadsheets
  • yaml: YAML parser and emitter for Python