Pinned Repositories
causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
edge-slm
This project is a native implementation of a RAG pipeline for Small Language Models tested on Android devices. The main goal was to fit the whole RAG pipeline into a resource constrained device - ie. smartphone. By design the provided RAG library should be deployable on various platforms.
flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
llama.cpp
LLM inference in C/C++
llama3_learn.c
Inference deployment of the llama3
My520
520表白神器
STF
Pytorch implementation of the paper "The Devil Is in the Details: Window-based Attention for Image Compression".
T2T-ViT
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
qllm-eval
Code Repository of Evaluating Quantized Large Language Models
guoguo1314's Repositories
guoguo1314/llama3_learn.c
Inference deployment of the llama3
guoguo1314/My520
520表白神器
guoguo1314/STF
Pytorch implementation of the paper "The Devil Is in the Details: Window-based Attention for Image Compression".
guoguo1314/T2T-ViT
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet