Pinned Repositories
DLFrameworkTest
My tests and experiments with some popular dl frameworks.
EfficientAttention-Notes
FractalTensor
LearnHaskell
So I decide to learn a functional programming language.
LearningNotes
My learning notes.
models
Model configureations
paddle_confs_v1
paddle configuration files written by old API.
TeXNotes
TileFusion
VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
lcy-seso's Repositories
lcy-seso/DLFrameworkTest
My tests and experiments with some popular dl frameworks.
lcy-seso/LearningNotes
My learning notes.
lcy-seso/EfficientAttention-Notes
lcy-seso/TeXNotes
lcy-seso/FractalTensor
lcy-seso/lcy-seso.github.io
Ying's learning notes.
lcy-seso/Tiled-EfficientAttention
lcy-seso/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
lcy-seso/TileFusion
lcy-seso/VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
lcy-seso/accelerated-scan
Accelerated First Order Parallel Associative Scan
lcy-seso/Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
lcy-seso/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
lcy-seso/Carrot
lcy-seso/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
lcy-seso/cutlass
CUDA Templates for Linear Algebra Subroutines
lcy-seso/flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
lcy-seso/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
lcy-seso/gated_linear_attention
lcy-seso/ggml
Tensor library for machine learning
lcy-seso/llama
Inference code for LLaMA models
lcy-seso/llama.cpp
Port of Facebook's LLaMA model in C/C++
lcy-seso/llm-foundry
LLM training code for MosaicML foundation models
lcy-seso/loopy
A code generator for array-based code on CPUs and GPUs
lcy-seso/mamba
lcy-seso/RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
lcy-seso/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
lcy-seso/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
lcy-seso/whisper.cpp
Port of OpenAI's Whisper model in C/C++
lcy-seso/wmma_extension
An extension library of WMMA API (Tensor Core API)