Pinned Repositories
augmented-traffic-control
Augmented Traffic Control: A tool to simulate network conditions
Awesome-Mixture-of-Experts-Papers
A curated reading list of research in Mixture-of-Experts(MoE).
aws-cv-task2vec
Official code for the paper "Task2Vec: Task Embedding for Meta-Learning" (https://arxiv.org/abs/1902.03545, ICCV 2019)
Bone-point
Cloud-Gaming-Video-Dataset
CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
cutlass
CUDA Templates for Linear Algebra Subroutines
cutlass-kernels
Library of CUTLASS kernels targeting Large Language Models (LLM).
distributed-llama
Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
Vehicle-recognition
Vehicle recognition
JiangShanCode's Repositories
JiangShanCode/Vehicle-recognition
Vehicle recognition
JiangShanCode/augmented-traffic-control
Augmented Traffic Control: A tool to simulate network conditions
JiangShanCode/Awesome-Mixture-of-Experts-Papers
A curated reading list of research in Mixture-of-Experts(MoE).
JiangShanCode/aws-cv-task2vec
Official code for the paper "Task2Vec: Task Embedding for Meta-Learning" (https://arxiv.org/abs/1902.03545, ICCV 2019)
JiangShanCode/Bone-point
JiangShanCode/Cloud-Gaming-Video-Dataset
JiangShanCode/CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
JiangShanCode/cutlass
CUDA Templates for Linear Algebra Subroutines
JiangShanCode/cutlass-kernels
Library of CUTLASS kernels targeting Large Language Models (LLM).
JiangShanCode/distributed-llama
Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
JiangShanCode/JiangShanCode
MyBlog
JiangShanCode/MyBlogImages
MyBlogImages
JiangShanCode/nanoPyC
JiangShanCode/SimSiam
Pytorch implementation of the paper Exploring Simple Siamese Representation Learning.
JiangShanCode/fp6_llm
An efficient GPU support for LLM inference with 6-bit quantization (FP6).
JiangShanCode/Grace
GRACE: Loss-Resilient Real-Time Video through Neural Codecs (https://www.usenix.org/system/files/nsdi24-cheng.pdf)
JiangShanCode/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
JiangShanCode/llama.cpp
LLM inference in C/C++
JiangShanCode/llm.c
LLM training in simple, raw C/CUDA
JiangShanCode/ncnn-examples
C program of NCNN
JiangShanCode/NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
JiangShanCode/nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
JiangShanCode/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
JiangShanCode/triton
Development repository for the Triton language and compiler
JiangShanCode/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
JiangShanCode/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow