Pinned Repositories
DenseFormer
landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
llm-baselines
nanoGPT-like codebase for LLM training
OptML_course
EPFL Course - Optimization for Machine Learning - CS-439
flashinfer
FlashInfer: Kernel Library for LLM Serving
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
getting-started
RC-2020
Editorial venue for ML Reproducibility Challenge 2020 Accepted papers
xla
QuaRot
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
mkrima's Repositories
mkrima/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
mkrima/getting-started
mkrima/RC-2020
Editorial venue for ML Reproducibility Challenge 2020 Accepted papers
mkrima/xla