This repository contains a curated list of tools and resources for various categories related to AI/ML development and operations. The list includes popular tools as well as some open-source projects and learning resources. Please note that the information provided here is based on the knowledge cutoff in September 2021 and may not include the latest tools and advancements.
- Cloud Providers
- Data Versioning
- Feature Stores
- Model Training Orchestration
- Hyperparameter Tuning
- Experiment Tracking
- Model Deployment & Serving
- Model Monitoring
- Model Governance & Management
- Model Explainability
- Model Testing
- Automation & Streamlining
- Infrastructure
- Collaboration
- Project Management
- Knowledge Management
- Communication
- AI/ML Libraries
- IDEs
- Data Visualization
- Microservices
- Open Source AI/ML Projects
- AI/ML Learning Resources
- HR for Global Teams
- Amazon Web Services (AWS): A comprehensive, evolving cloud computing platform provided by Amazon.
- Google Cloud Platform (GCP): A suite of cloud computing services offered by Google.
- Microsoft Azure: A cloud computing service created by Microsoft.
- IBM Cloud: IBM's open and secure public cloud for business.
- Oracle Cloud: It offers best-in-class services across software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS).
- Floom: Cloud/Container AI gateway and marketplace for developers, enables streamlined integration of AI features into products.
- DVC (Data Version Control): An open-source tool for data science and machine learning projects that enables version control of large datasets, ML models, and intermediate files.
- Pachyderm: A version-controlled data lineage system.
- Feast: An open-source feature store for machine learning.
- Tecton: A feature store for operational machine learning.
- Hopsworks: An open-source data-intensive AI platform with a feature store.
- Kubeflow: An open-source project dedicated to making deployments of machine learning workflows on Kubernetes simple, portable, and scalable.
- Polyaxon: An open-source platform for machine learning lifecycle management.
- MLflow: An open-source platform to manage the ML lifecycle.
- Katib: A Kubernetes-native project for automated machine learning (AutoML).
- Hyperopt: A Python library for optimizing over awkward search spaces.
- Optuna: An open-source hyperparameter optimization framework in Python.
- MLflow: An open-source platform to manage the ML lifecycle.
- Weights & Biases: A tool that helps track experiments in deep learning projects.
- Comet.ml: A machine learning platform that enables engineers to automatically track their datasets, code changes, experimentation history.
- TensorFlow Serving: A flexible, high-performance serving system for machine learning models.
- Seldon: An open-source platform for deploying, scaling, and managing machine learning models in Kubernetes.
- BentoML: A flexible, high-performance framework for serving, managing, and deploying machine learning models.
- TorchServe: A flexible and easy-to-use tool for serving PyTorch models in production.
- Model Monitor (Amazon SageMaker): A service that automatically monitors ML models in production.
- Prometheus + Grafana: Prometheus is an open-source systems monitoring and alerting toolkit. Grafana is the open-source analytics & monitoring solution.
- Evidently.AI: An open-source tool for machine learning model validation and monitoring.
- MLflow: An open-source platform to manage the ML lifecycle.
- Neptune: A metadata store for MLOps.
- Alteryx: A leader in analytic process automation (APA).
- SHAP (SHapley Additive exPlanations): A game theoretic approach to explain the output of any machine learning model.
- Lime: A project that explains what machine learning classifiers (or models) are doing.
- Alibi: An open-source Python library aimed at machine learning model inspection and interpretation.
- Great Expectations: A Python-based open-source library for validating, documenting, and profiling your data.
- Deequ: A library built on top of Apache Spark for defining 'unit tests for data'.
- TFDV (TensorFlow Data Validation): A library used to analyze and validate machine learning data.
- Jenkins: An open-source automation server.
- GitLab CI/CD: A tool built into GitLab for software development through the continuous methodologies.
- GitHub Actions: A CI/CD platform that automates all your software workflows.
- Argo CD: A declarative, GitOps continuous delivery tool for Kubernetes.
- Docker: An open-source platform to automate the deployment, scaling, and management of applications.
- Kubernetes: An open-source platform designed to automate deploying, scaling, and operating application containers.
- Terraform: An open-source infrastructure as code software tool.
- Jupyter Notebooks: An open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
- Colab Notebooks: Google Colab is a free cloud service with GPU support.
- GitHub: A global platform that brings together the world's largest community of developers.
- Jira: A proprietary issue tracking product developed by Atlassian.
- Trello: A web-based Kanban-style list-making application.
- Asana: A web and mobile application designed to help teams organize, track, and manage their work.
- Monday.com: A cloud-based platform that allows teams to create their own IT systems without coding.
- Plane (makeplane/plane): An open-source project management tool.
- Notion: An all-in-one workspace where you can write, plan, collaborate and get organized.
- Confluence: A collaboration wiki tool used to help teams to collaborate and share knowledge efficiently.
- AFFiNE (toeverything/AFFiNE): A knowledge management tool that integrates AI technology to enhance knowledge discovery and sharing.
- Quivr - Get a Second Brain with Generative AI: An AI tool that organizes, connects, and generates insights from your notes.
- Airtable: A cloud-based collaboration service.
- Slack: A proprietary business communication platform.
- Microsoft Teams: A unified communication and collaboration platform.
- Zoom: A proprietary video teleconferencing software program.
- TensorFlow: An end-to-end open-source platform for machine learning.
- PyTorch: An open-source machine learning library.
- Scikit-learn: A free software machine learning library for Python.
- Keras: A user-friendly neural network library written in Python.
- NLTK (Natural Language Toolkit): A leading platform for building Python programs to work with human language data.
- XGBoost: A scalable, portable, and distributed gradient boosting (GBDT, GBRT, or GBM) library.
- Jupyter Notebook: An open-source web application that allows the creation and sharing of documents with live code, equations, visualizations, and narrative text.
- Google Colab: A free Jupyter notebook environment that runs entirely in the cloud.
- VS Code: A freeware source-code editor made by Microsoft.
- PyCharm: An integrated development environment (IDE) used in computer programming, specifically for the Python language.
- Matplotlib: A plotting library for Python.
- Seaborn: A Python data visualization library based on Matplotlib.
- **Plotly
**: An open-source data visualization library used to create interactive and high-quality graphs in R and Python.
- Tableau: A data visualization tool used in the Business Intelligence industry.
- Flask: A micro web framework written in Python.
- Django: A high-level Python web framework that enables rapid development of secure and maintainable websites.
- Express.js: A web application framework for Node.js, released as free and open-source software.
- Jenkins: An open-source automation server.
- GitLab CI/CD: A tool built into GitLab for software development through the continuous methodologies.
- GitHub Actions: A CI/CD platform that automates all your software workflows.
- Argo CD: A declarative, GitOps continuous delivery tool for Kubernetes.
- Chadxz.dev - How Platform Engineering Works
- Smol Developer - With 100k context windows on the way, it's now feasible for every dev to have their own smol developer
- Rift - Rift: an AI-native language server for your personal AI software engineer
- Cursor - An editor made for programming with AI 🤖
- Modal - End-to-end cloud compute Model inference, batch jobs, task queues, web apps and more. All without your own infrastructure.
- Chainlit - Build Python LLM apps in minutes ⚡️
- Pipedream - Connect APIs, remarkably fast. Stop writing boilerplate code, struggling with authentication, and managing infrastructure. Start connecting APIs with code-level control when you need it — and no code when you don't.
- Make.com - Design workflow #withMake. From tasks and workflows to apps and systems, build and automate anything in one powerful visual platform.
- Render - Render is a unified cloud to build and run all your apps and websites with free TLS certificates, a global CDN, DDoS protection, private networks, and auto-deploy from GitHub.
- LeanDojo - LeanDojo is an open-source playground consisting of toolkits, benchmarks, and models for LLMs to prove formal theorems in the Lean proof assistant.
- Ai-Shell - A CLI that converts natural language to shell commands.
- Quivr Mobile - The Quivr React Native Client is a mobile application built using React Native that provides users with the ability to upload files and engage in chat conversations using the Quivr backend API.
- AutoScrum - AutoScrum is a python script for automating the Scrum project planning process using language models.
- Whisper.cpp - Port of OpenAI's Whisper model in C/C++ whisper: support speaker segmentation (local diarization) of mono audio via tinydiarize #1058
- Ai-Engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
- Ecoute - Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.
- Airtable - The fastest way to build apps. Transform your business with Airtable's next-gen app platform. Unmatched agility and efficiency.
- Quivr - An open platform for training, serving, and evaluating a large language model for tool learning.
- GitHub Assistant - Generative AI GitHub Assistant for Your Repository
- Yohei Nakajima's Twitter - Woo hoo 🎉 Just set up "Deals McDealFace" - an internal email address for tracking deals!
- FinGPT - Data-Centric FinGPT. Open-source for open finance! Revolutionize 🔥 We'll soon release the trained model.
- Salesforce AI Research - Toward Actionable Generative AI LAMs: From Large Language Models to Large Action Models
- Otter - 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
- ToolBench - An open platform for training, serving, and evaluating large language model for tool learning.
- Guardrails - Adding guardrails to large language models.
- LLM-ToolMaker - Large Language Models as Tool Makers.
- AI-Shell - A CLI that converts natural language to shell commands.
- FinGPT - Data-Centric FinGPT. Open-source for open finance! Revolutionize 🔥 We'll soon release the trained model.
- LeanDojo - We release LeanDojo (http://leandojo.org): an open-source playground consisting of toolkits, benchmarks, and models for LLMs to prove formal theorems in the Lean proof assistant.
- ToolBench - An open platform for training, serving, and evaluating large language model for tool learning.
- Ecoute - Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation.
Sure thing, here are brief descriptions for each link in Markdown:
- SocraticAI - A research project exploring the application of the Socratic method as a tool for self-discovery within large language models.
- SequenceMatch - A paper introducing a method for imitation learning in autoregressive sequence modelling with backtracking.
- ZipIt! - A research article presenting a technique for combining models from different tasks without additional training.
- Understanding Social Reasoning in Language Models with Language Models - This study delves into how social reasoning is encapsulated within language models.
- Supervised Pretraining Can Learn In-Context Reinforcement Learning - An analysis demonstrating how supervised pretraining can be adapted for in-context reinforcement learning scenarios.
- Extending Context Window of Large Language Models via Positional Interpolation - Research on techniques to extend the context window of large language models via positional interpolation.
- Inferring the Goals of Communicating Agents from Actions and Instructions - A study on the methodologies for inferring the goals of agents from their actions and instructions.
- Toward Actionable Generative AI LAMs: From Large Language Models to Large Action Models - A blog post discussing the transition from large language models to large action models in generative AI.
- MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers - A research paper outlining how to predict million-byte sequences using multiscale transformers.
- RepoFusion: Training Code Models to Understand Your Repository - This study presents RepoFusion, a method for training code models to better understand a specific code repository.
- Personality Traits in Large Language Models - A study investigating how personality traits manifest within large language models.
- Improving Language Plasticity via Pretraining with Active Forgetting - A research paper discussing strategies to improve language plasticity through pretraining with active forgetting.
- RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model - A research paper on learning to prompt for instance segmentation in remote sensing, based on a visual foundation model.
- The Concise TypeScript Book - A comprehensive and concise guidebook for learning the TypeScript programming language.
- VLLM - A research paper discussing the VLLM, an approach to language modelling.
- GPT Migrate - A GitHub repository detailing the process and techniques of migrating GPT models.
- LLM As Chatbot - An exploration of the application of large language models in chatbot development.
- MetaGPT - A research paper discussing the MetaGPT approach to generative pretraining.
- System Design - A GitHub repository offering resources and projects related to system design in AI.
- Kaguya - A GitHub repository for the Kaguya project, an AI system for intelligent music generation.
- Talk - A research paper discussing 'Talk', a novel methodology for conversational AI.
- Emergent - A research paper investigating the concept of emergence in machine learning and artificial intelligence contexts.
- LLM Survey - A survey paper providing a comprehensive overview of the field of Large Language Models (LLM).
- Gorilla - A research paper discussing Gorilla, an approach aimed at enhancing learning efficiency in AI.
- Dialoqbase - A paper introducing Dialoqbase, a project that aims to improve the generation of human-like dialogue.
- Financial Document Analysis with LlamaIndex - A Jupyter Notebook showcasing how to use the LlamaIndex tool to perform financial document analysis.
- GPT-Index - Documentation for GPT-Index, a tool for indexing and querying text using the GPT model.
- MindMapper - This research presents the MindMapper technique, a new methodology for mapping and interpreting AI models.
- Linen.dev - This paper explores Linen, a tool designed for developing and fine-tuning AI models.
- MetaGPT - A GitHub repository dedicated to the MetaGPT project, which investigates meta-learning in GPT.
- APITable - A GitHub repository for the APITable project, a research endeavor focused on the design and use of APIs in AI.
- GPT4All - A research paper discussing the GPT4All approach, which aims to democratize the use of large language models.
- PySpark AI - A paper introducing PySpark AI, an approach designed for large-scale data processing in machine learning.
- The Concise TypeScript Book
- VLLM
- GPT Migrate
- LLM As Chatbot
- MetaGPT
- System Design
- Kaguya
- Talk
- Emergent
- LLM Survey
- Gorilla
- Dialoqbase
- Financial Document Analysis with LlamaIndex
- GPT-Index
- MindMapper
- Linen.dev
- MetaGPT
- APITable
- GPT4All
- PySpark AI
- Monster API
- Platform Engineering Works
- Ray Serve
- Ray Aviary
- smol developer
- Rift
- Cursor
- Modal
- Chainlit
- Pipedream
- Make
- Render
- Otter
- Quivr Mobile
- AutoScrum
- Autoscrum™: Automating Project Planning Using Language Model Programs
- Generative AI Github Assistant
- Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4
- Toolformer (Toolformer-pytorch)
- Toolformer: Language Models Can Teach Themselves to Use Tools
- ToolBench
- Guardrails
- LLM-ToolMaker
- ai-shell
- FinGPT
- The Socratic Method for Self-Discovery in Large Language Models
- Deals McDealFace
- LeanDojo
- ZipIt!
- SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
- Custom Retriever combining KG Index and VectorStore Index
- From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought
- ToolBench
- Adding Guardrails to Large Language Models
- ecoute
- LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
- Transformers Meet Directed Graphs
- OpenPlayground
- Inferring the Goals of Communicating Agents from Actions and Instructions
- Understanding Social Reasoning in Language Models with Language Models
- WebGLM
- TART
- Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding
- Image Captioners Are Scalable Vision Learners Too
- Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
- FLARE Query Engine
- Mind2Web: Towards a Generalist Agent for the Web
- Improve ChatGPT with Knowledge Graphs
- Meta-training with Demonstration Retrieval for Efficient Few-shot Learning
- Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
- Improving Language Plasticity via Pretraining with Active Forgetting
- RSPrompter
- RSPrompter (pytorch implementation)
- AutoScrum evaluation
- Whisper.cpp
- LEDITS
- Transformers Meet Directed Graphs
- OpenPlayground
- Inferring the Goals of Communicating Agents from Actions and Instructions
- Understanding Social Reasoning in Language Models with Language Models
- The Quivr React Native Client
- Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind
- Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind
- Automata: Bottom-up self-coding agents