guoguo1314

Pinned Repositories

causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
Language:Cuda377 5 2770
edge-slm
This project is a native implementation of a RAG pipeline for Small Language Models tested on Android devices. The main goal was to fit the whole RAG pipeline into a resource constrained device - ie. smartphone. By design the provided RAG library should be deployable on various platforms.
Language:C++71 1 34
flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Language:Python1.6k 29 8686
llama.cpp
LLM inference in C/C++
Language:C++70.7k 560 4.3k10.2k
llama3_learn.c
Inference deployment of the llama3
Language:C11 2 00
My520
520表白神器
Language:HTML00
STF
Pytorch implementation of the paper "The Devil Is in the Details: Window-based Attention for Image Compression".
Language:Python0 0 00
T2T-ViT
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Language:Jupyter Notebook0 0 00
rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Language:C++1.4k 23 82100
qllm-eval
Code Repository of Evaluating Quantized Large Language Models
Language:Python112 5 55

guoguo1314's Repositories

guoguo1314/llama3_learn.c
Inference deployment of the llama3
Language:C11 2 00
guoguo1314/My520
520表白神器
Language:HTML00
guoguo1314/STF
Pytorch implementation of the paper "The Devil Is in the Details: Window-based Attention for Image Compression".
Language:Python0 0 00
guoguo1314/T2T-ViT
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Language:Jupyter Notebook0 0 00