This repository was created to aggregate the growing list of technical terms used in the AI and ML fields.

Terms

Term	Definition	Source
Activation Function	An activation function is a function that is used to determine the output of a neural network.	Activation Function

Abbreviations

Term	Definition	Source
BERT	BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based model that uses a language model to generate text.	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BLAS	BLAS (Basic Linear Algebra Subprograms) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication.	Basic Linear Algebra Subprograms
CLIP	CLIP (Contrastive Language-Image Pre-training) is a transformer-based model that uses a language model to generate text.	CLIP: Connecting text and images
DPO	Direct Preference Optimization (DPO) is a technique that is used to train a model to predict the next token in a sequence.	Direct Preference Optimization
DSPy	Demonstrate-Sequencing-Predict (DSPy) is a technique that is used to train a model to predict the next token in a sequence.	DSPy: Programming—not prompting—Foundation Models
GGUF	GPT-Generated Unified Format (GGUF)	GGML vs GGUF
GGML	Georgi Gerganov’s Machine Learning (GGML) was an early attempt to create a file format for storing GPT models.	GGML vs GGUF
GLU	GLU (Gated Linear Unit) is a gating mechanism that uses a sigmoid function to gate the output of a linear layer.	Language Models are Unsupervised Multitask Learners
FSDP	Fully Sharded Data Parallelism (FSDP) is a data parallel training technique that shards both the model and the data across multiple GPUs.	Fully Sharded Data Parallelism in PyTorch
GPT	GPT (Generative Pre-trained Transformer) is a transformer-based model that uses a language model to generate text.	Improving Language Understanding by Generative Pre-Training
MLP	MLP (Multi-Layer Perceptron) is a feedforward neural network that consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer.	A Brief Introduction to Neural Networks
NLP	NLP (Natural Language Processing) is a field of computer science that focuses on the interaction between computers and humans using natural language.	Natural Language Processing
NN	NN (Neural Network) is a computing system that is inspired by the biological neural networks that constitute animal brains.	Artificial Neural Network
RAG	RAG (Retrieval-Augmented Generation) is a transformer-based model that uses a retriever to retrieve relevant passages from a knowledge base and then uses a generator to generate an answer based on the retrieved passages.	Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
RoPE	RoPE (Rotary Positional Encoding) is a positional encoding that uses a rotary encoding scheme to represent the absolute position of a token within a sequence.	Rotary Positional Embedding
SFT	Shrink and Fine-Tune (SFT) is a type of distillation that avoids explicit distillation by copying parameters to a student model and then fine-tuning	Pre-trained Summarization
SFT	Supervised Fine-tuning (SFT) involves adapting a pre-trained Language Model (LLM) to a specific downstream task using labeled data.	Supervised Fine-tuning: customizing LLMs
SwiGLU	SwiGLU is an activation function which is a variant of GLU.	Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
TRL	Technology Readiness Levels (TRL) are a type of measurement system used to assess the maturity level of a particular technology.	Technology Readiness Level
ZeRO	ZeRO (Zero Redundancy Optimizer) is a novel memory optimization technology that allows users to train deep learning models with parameter sizes that exceed GPU VRAM sizes.	ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

jtmuller5/ai-definitions

Terms

Abbreviations