zhixuan-lin

PhD student at Mila and UdeM

University of Montreal

zhixuan-lin's Stars

Doraemonzzz/hgru2-pytorch
Language:Python211
glassroom/heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
Language:Python221
google-deepmind/recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
Language:Python60526
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda24.2k2.7k
OpenNLPLab/HGRN2
HGRN2: Gated Linear RNNs with State Expansion
Language:Python482
BlinkDL/LinearAttentionArena
Here we will test various linear attention designs.
Language:Python556
dtunai/Griffin-Jax
Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
Language:Python12
kyegomez/Griffin
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
Language:Python493
dangxingyu/rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
Language:Python241
proger/hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
Language:Python834
tobiaskatsch/GatedLinearRNN
Language:Python231
MichaelTMatthews/Craftax
(Crafter + NetHack) in JAX. ICML 2024 Spotlight.
Language:Python19620
srush/mamba-scans
Blog post
16
EleutherAI/rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
Language:Jupyter Notebook322
ShacharHarshuv/open-ear
Language:TypeScript10319
Sea-Snell/JAXSeq
Train very large language models in Jax.
Language:Python19517
proger/accelerated-scan
Accelerated First Order Parallel Associative Scan
Language:Python1578
HazyResearch/safari
Convolutions for Sequence Modeling
Language:Assembly86771
alxndrTL/mamba.py
A simple and efficient Mamba implementation in pure PyTorch and MLX.
Language:Python98189
google/paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading model flop utilization rates.
Language:Python45768
johnryan465/pscan
Language:Python363
sustcsonglin/mamba-triton
Language:Python442
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Language:Python1.3k67
AvivBick/awesome-ssm-ml
Reading list for research topics in state-space models
22924
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Language:Python6.9k1k
EleutherAI/pythia
The hub for EleutherAI's work on interpretability and learning dynamics
Language:Jupyter Notebook2.3k170
berlino/gated_linear_attention
Language:Python972
buttercutter/Mamba_SSM
A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)
Language:Python196
apple/ml-sigma-reparam
Language:Python29211
radarFudan/Awesome-state-space-models
Collection of papers on state-space models
54519

zhixuan-lin

zhixuan-lin's Stars

Doraemonzzz/hgru2-pytorch

glassroom/heinsen_attention

google-deepmind/recurrentgemma

karpathy/llm.c

OpenNLPLab/HGRN2

BlinkDL/LinearAttentionArena

dtunai/Griffin-Jax

kyegomez/Griffin

dangxingyu/rnn-icrag

proger/hippogriff

tobiaskatsch/GatedLinearRNN

MichaelTMatthews/Craftax

srush/mamba-scans

EleutherAI/rnngineering

ShacharHarshuv/open-ear

Sea-Snell/JAXSeq

proger/accelerated-scan

HazyResearch/safari

alxndrTL/mamba.py

google/paxml

johnryan465/pscan

sustcsonglin/mamba-triton

sustcsonglin/flash-linear-attention

AvivBick/awesome-ssm-ml

EleutherAI/gpt-neox

EleutherAI/pythia

berlino/gated_linear_attention

buttercutter/Mamba_SSM

apple/ml-sigma-reparam

radarFudan/Awesome-state-space-models