Pinned Repositories
att-norm
Does norming things enable better attention?
attention-experiments
I'm playing around with Attention mechanisms
blog
My blog
dspy
DSPy: The framework for programming—not prompting—foundation models
dspy-redteam-tests
Red-Teaming Language Models with DSPy
grokfast
Trying out the grokfast algorithm on LLMs
hlb-gpt-cli
CLI controllable version of hlb-gpt by tysam-code
kan
Ablate KAN and Fourier KAN vs. normal Linear Layers in LLMs
rebasin
Apply methods described in "Git Re-basin"-paper [1] to arbitrary models --- [1] Ainsworth et al. (https://arxiv.org/abs/2209.04836)
rebasin-results
Results for snimu/rebasin
snimu's Repositories
snimu/rebasin
Apply methods described in "Git Re-basin"-paper [1] to arbitrary models --- [1] Ainsworth et al. (https://arxiv.org/abs/2209.04836)
snimu/rebasin-results
Results for snimu/rebasin
snimu/grokfast
Trying out the grokfast algorithm on LLMs
snimu/kan
Ablate KAN and Fourier KAN vs. normal Linear Layers in LLMs
snimu/dspy-redteam-tests
Red-Teaming Language Models with DSPy
snimu/att-norm
Does norming things enable better attention?
snimu/blog
My blog
snimu/hlb-gpt-cli
CLI controllable version of hlb-gpt by tysam-code
snimu/attention-experiments
I'm playing around with Attention mechanisms
snimu/dspy
DSPy: The framework for programming—not prompting—foundation models
snimu/etbl-vision
Embracing the bitter lesson (vision)
snimu/gradient-rounding
Round the gradient during LLM training to different degrees; compare "scaling" of rounding to different significant digits to parameter scaling
snimu/hlb-CIFAR10
Train to 94% on CIFAR-10 in ~6.84 seconds on a single A100, the current world speed record. Or ~95.78% in ~114 seconds (or less!)
snimu/hlb-gpt
Minimalistic, fast, and experimentation-friendly researcher's toolbench for GPT-like models in ~<365 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in ~138 seconds.
snimu/llm-parameter-stats
How do parameter statistics change over training in LLMs?
snimu/neuralsort
Sort lists with the help of an ANN to allow maximal parallelism in execution.
snimu/parameter-checks
Extend typehints to include dynamic checks (that might otherwise be dealt with by assertions) in Python.
snimu/torch-benchmarks
Performance benchmark for PyTorch models
snimu/hlb-gpt-value-activation
Check out how much of a difference the activation of the value makes vs. keeping it linear as in standard attention
snimu/llm-small-to-large
1. Train small LLM; 2. Use its outputs on the training data as labels for training large LLM, where their argmax agrees with the training data.
snimu/lm-evaluation-harness
A framework for few-shot evaluation of language models.
snimu/mask
Some experiments with Attention masks
snimu/mixture-of-tokenizers
Mixture of Tokenizers
snimu/plan-act
A better way for LLMs to plan before acting.
snimu/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable.
snimu/token-merge
Should we merge tokens during autoregressive generation?
snimu/torch-nested
Easily manipulate torch.Tensors inside highly nested data-structures.
snimu/torchinfo
View model summaries in PyTorch!
snimu/typing-exe
Executable typehints for Python: make assertions about and/or modify parameters & return values
snimu/ul2
How much information can we extract from one token?