christo/ml-research

TypeScriptMIT

Human Learning Machine Learning

Notes and code for exploring ML.

Research Notes collected from various sources.
Linear Algebra Notes
Idea: Neural Network in MIT Scratch
Ideas prompted by developments in machine learning for psychology, philosophy, politics etc.

Reading List

Several papers recommended by the alphanerds at GPU Mode ML Systems Onboarding Reading List

Attention
- Attention is All You Need
- Online normalizer calculation for softmax: Read before Flash Attention.
- Self Attention does not need O(n^2) memory:
- Flash Attention 2
- Llama 2 paper
- gpt-fast
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Performance
- Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems: Wonderful survey, start here
- Efficiently Scaling transformer inference: Introduced many ideas most notably KV caches
- Making Deep Learning go Brrr from First Principles: One of the best intros to fusions and overhead
Quantisation
- A White Paper on Neural Network Quantization
- LLM.int8: All of Dettmers papers are great but this is a natural intro
- FP8 formats for deep learning: For a first hand look of how new number formats come about
- Smoothquant: Balancing rounding errors between weights and activations#
Long Context Length
- RoFormer: Enhanced Transformer with Rotary Position Embedding: The paper that introduced rotary positional embeddings
- YaRN: Efficient Context Window Extension of Large Language Models: Extend base model context lengths with finetuning
- Ring Attention with Blockwise Transformers for Near-Infinite Context: Scale to infinite context lengths as long as you can stack more GPUs
Sparsity
- Venom: Vectorized N:M Format for sparse tensor cores
- Megablocks: Efficient Sparse training with mixture of experts
- ReLu Strikes Back: Activation sparsity in LLMs
- Sparse Llama
- Simple pruning for LLMs

Prototype Project

Colour name classifier without ML library dependency Colour Prototype

Only 2-3 input nodes (RGB, HSL or maybe HL)
output nodes on the order of 10
prototype Implementation in TypeScript

Muzaklassifier

Music classifier idea Muzaklassifier (pre-larval stage)