/ai-definitions

Definitions and abbreviations relevant to AI and ML

Primary LanguagePython

This repository was created to aggregate the growing list of technical terms used in the AI and ML fields.

Terms

Term Definition Source
Activation Function An activation function is a function that is used to determine the output of a neural network. Activation Function

Abbreviations

Term Definition Source
BERT BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based model that uses a language model to generate text. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BLAS BLAS (Basic Linear Algebra Subprograms) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. Basic Linear Algebra Subprograms
CLIP CLIP (Contrastive Language-Image Pre-training) is a transformer-based model that uses a language model to generate text. CLIP: Connecting text and images
DPO Direct Preference Optimization (DPO) is a technique that is used to train a model to predict the next token in a sequence. Direct Preference Optimization
DSPy Demonstrate-Sequencing-Predict (DSPy) is a technique that is used to train a model to predict the next token in a sequence. DSPy: Programming—not prompting—Foundation Models
GGUF GPT-Generated Unified Format (GGUF) GGML vs GGUF
GGML Georgi Gerganov’s Machine Learning (GGML) was an early attempt to create a file format for storing GPT models. GGML vs GGUF
GLU GLU (Gated Linear Unit) is a gating mechanism that uses a sigmoid function to gate the output of a linear layer. Language Models are Unsupervised Multitask Learners
FSDP Fully Sharded Data Parallelism (FSDP) is a data parallel training technique that shards both the model and the data across multiple GPUs. Fully Sharded Data Parallelism in PyTorch
GPT GPT (Generative Pre-trained Transformer) is a transformer-based model that uses a language model to generate text. Improving Language Understanding by Generative Pre-Training
MLP MLP (Multi-Layer Perceptron) is a feedforward neural network that consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer. A Brief Introduction to Neural Networks
NLP NLP (Natural Language Processing) is a field of computer science that focuses on the interaction between computers and humans using natural language. Natural Language Processing
NN NN (Neural Network) is a computing system that is inspired by the biological neural networks that constitute animal brains. Artificial Neural Network
RAG RAG (Retrieval-Augmented Generation) is a transformer-based model that uses a retriever to retrieve relevant passages from a knowledge base and then uses a generator to generate an answer based on the retrieved passages. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
RoPE RoPE (Rotary Positional Encoding) is a positional encoding that uses a rotary encoding scheme to represent the absolute position of a token within a sequence. Rotary Positional Embedding
SFT Shrink and Fine-Tune (SFT) is a type of distillation that avoids explicit distillation by copying parameters to a student model and then fine-tuning Pre-trained Summarization
SFT Supervised Fine-tuning (SFT) involves adapting a pre-trained Language Model (LLM) to a specific downstream task using labeled data. Supervised Fine-tuning: customizing LLMs
SwiGLU SwiGLU is an activation function which is a variant of GLU. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
TRL Technology Readiness Levels (TRL) are a type of measurement system used to assess the maturity level of a particular technology. Technology Readiness Level
ZeRO ZeRO (Zero Redundancy Optimizer) is a novel memory optimization technology that allows users to train deep learning models with parameter sizes that exceed GPU VRAM sizes. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models