zhixuan-lin's Stars
Doraemonzzz/hgru2-pytorch
glassroom/heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
google-deepmind/recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
karpathy/llm.c
LLM training in simple, raw C/CUDA
OpenNLPLab/HGRN2
HGRN2: Gated Linear RNNs with State Expansion
BlinkDL/LinearAttentionArena
Here we will test various linear attention designs.
dtunai/Griffin-Jax
Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
kyegomez/Griffin
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
dangxingyu/rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
proger/hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
tobiaskatsch/GatedLinearRNN
MichaelTMatthews/Craftax
(Crafter + NetHack) in JAX. ICML 2024 Spotlight.
srush/mamba-scans
Blog post
EleutherAI/rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
ShacharHarshuv/open-ear
Sea-Snell/JAXSeq
Train very large language models in Jax.
proger/accelerated-scan
Accelerated First Order Parallel Associative Scan
HazyResearch/safari
Convolutions for Sequence Modeling
alxndrTL/mamba.py
A simple and efficient Mamba implementation in pure PyTorch and MLX.
google/paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading model flop utilization rates.
johnryan465/pscan
sustcsonglin/mamba-triton
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
AvivBick/awesome-ssm-ml
Reading list for research topics in state-space models
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
EleutherAI/pythia
The hub for EleutherAI's work on interpretability and learning dynamics
berlino/gated_linear_attention
buttercutter/Mamba_SSM
A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)
apple/ml-sigma-reparam
radarFudan/Awesome-state-space-models
Collection of papers on state-space models