snimu

Deep Learning at BMW.

BMWMunich

Pinned Repositories

att-norm
Does norming things enable better attention?
Language:Python10
attention-experiments
I'm playing around with Attention mechanisms
Language:Python00
blog
My blog
1 1 00
dspy
DSPy: The framework for programming—not prompting—foundation models
Language:Python00
dspy-redteam-tests
Red-Teaming Language Models with DSPy
Language:Python2 0 01
grokfast
Trying out the grokfast algorithm on LLMs
Language:Python4 2 00
hlb-gpt-cli
CLI controllable version of hlb-gpt by tysam-code
Language:Python10
kan
Ablate KAN and Fourier KAN vs. normal Linear Layers in LLMs
Language:Python3 1 00
rebasin
Apply methods described in "Git Re-basin"-paper [1] to arbitrary models --- [1] Ainsworth et al. (https://arxiv.org/abs/2209.04836)
Language:Python14 2 21
rebasin-results
Results for snimu/rebasin
Language:Python5 1 00

snimu's Repositories

snimu/rebasin
Apply methods described in "Git Re-basin"-paper [1] to arbitrary models --- [1] Ainsworth et al. (https://arxiv.org/abs/2209.04836)
Language:Python14 2 21
snimu/rebasin-results
Results for snimu/rebasin
Language:Python5 1 00
snimu/grokfast
Trying out the grokfast algorithm on LLMs
Language:Python4 2 00
snimu/kan
Ablate KAN and Fourier KAN vs. normal Linear Layers in LLMs
Language:Python3 1 00
snimu/dspy-redteam-tests
Red-Teaming Language Models with DSPy
Language:Python2 0 01
snimu/att-norm
Does norming things enable better attention?
Language:Python10
snimu/blog
My blog
1 1 00
snimu/hlb-gpt-cli
CLI controllable version of hlb-gpt by tysam-code
Language:Python10
snimu/attention-experiments
I'm playing around with Attention mechanisms
Language:Python00
snimu/dspy
DSPy: The framework for programming—not prompting—foundation models
Language:Python00
snimu/etbl-vision
Embracing the bitter lesson (vision)
Language:Python0 1 00
snimu/gradient-rounding
Round the gradient during LLM training to different degrees; compare "scaling" of rounding to different significant digits to parameter scaling
Language:Python00
snimu/hlb-CIFAR10
Train to 94% on CIFAR-10 in ~6.84 seconds on a single A100, the current world speed record. Or ~95.78% in ~114 seconds (or less!)
Language:Python0 0 00
snimu/hlb-gpt
Minimalistic, fast, and experimentation-friendly researcher's toolbench for GPT-like models in ~<365 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in ~138 seconds.
Language:Python0 0 00
snimu/llm-parameter-stats
How do parameter statistics change over training in LLMs?
Language:Python0 2 00
snimu/neuralsort
Sort lists with the help of an ANN to allow maximal parallelism in execution.
Language:Python0 1 00
snimu/parameter-checks
Extend typehints to include dynamic checks (that might otherwise be dealt with by assertions) in Python.
Language:Python0 1 00
snimu/torch-benchmarks
Performance benchmark for PyTorch models
Language:Python0 2 00
snimu/hlb-gpt-value-activation
Check out how much of a difference the activation of the value makes vs. keeping it linear as in standard attention
Language:Python
snimu/llm-small-to-large
1. Train small LLM; 2. Use its outputs on the training data as labels for training large LLM, where their argmax agrees with the training data.
Language:Python
snimu/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python0 0
snimu/mask
Some experiments with Attention masks
Language:Python2 0
snimu/mixture-of-tokenizers
Mixture of Tokenizers
Language:Python
snimu/plan-act
A better way for LLMs to plan before acting.
Language:Python
snimu/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable.
Language:Python0 0
snimu/token-merge
Should we merge tokens during autoregressive generation?
Language:Python
snimu/torch-nested
Easily manipulate torch.Tensors inside highly nested data-structures.
Language:Python1 0
snimu/torchinfo
View model summaries in PyTorch!
snimu/typing-exe
Executable typehints for Python: make assertions about and/or modify parameters & return values
Language:Python1 0
snimu/ul2
How much information can we extract from one token?
Language:Python