lcy-seso

We choose to go to the moon!

MSRAChina

Pinned Repositories

DLFrameworkTest
My tests and experiments with some popular dl frameworks.
Language:Python11 4 00
EfficientAttention-Notes
3 1 00
FractalTensor
Language:Python00
LearnHaskell
So I decide to learn a functional programming language.
2 2 00
LearningNotes
My learning notes.
Language:TeX7 4 00
models
Model configureations
Language:Python3 3 02
paddle_confs_v1
paddle configuration files written by old API.
Language:Python2 4 01
TeXNotes
Language:TeX20
TileFusion
Language:C++00
VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
Language:Python00

lcy-seso's Repositories

lcy-seso/DLFrameworkTest
My tests and experiments with some popular dl frameworks.
Language:Python11 4 00
lcy-seso/LearningNotes
My learning notes.
Language:TeX7 4 00
lcy-seso/EfficientAttention-Notes
3 1 00
lcy-seso/TeXNotes
Language:TeX20
lcy-seso/FractalTensor
Language:Python00
lcy-seso/lcy-seso.github.io
Ying's learning notes.
Language:SCSS0 1 00
lcy-seso/Tiled-EfficientAttention
Language:Python0 1 00
lcy-seso/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
Language:C++00
lcy-seso/TileFusion
Language:C++00
lcy-seso/VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
Language:Python00
lcy-seso/accelerated-scan
Accelerated First Order Parallel Associative Scan
Language:Cuda1 0
lcy-seso/Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
1 0
lcy-seso/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
1 0
lcy-seso/Carrot
Language:Python3 0
lcy-seso/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda0 0
lcy-seso/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++1 0
lcy-seso/flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Language:C++1 0
lcy-seso/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Language:Python0 0
lcy-seso/gated_linear_attention
Language:Python
lcy-seso/ggml
Tensor library for machine learning
Language:C1 0
lcy-seso/llama
Inference code for LLaMA models
Language:Python1 0
lcy-seso/llama.cpp
Port of Facebook's LLaMA model in C/C++
Language:C1 0
lcy-seso/llm-foundry
LLM training code for MosaicML foundation models
Language:Python1 0
lcy-seso/loopy
A code generator for array-based code on CPUs and GPUs
Language:Python1 0
lcy-seso/mamba
Language:Python1 0
lcy-seso/RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Language:Python1 0
lcy-seso/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
Language:Cuda1 0
lcy-seso/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Language:Python1 0
lcy-seso/whisper.cpp
Port of OpenAI's Whisper model in C/C++
Language:C1 0
lcy-seso/wmma_extension
An extension library of WMMA API (Tensor Core API)
Language:Cuda1 0