JiangShanCode

Pinned Repositories

augmented-traffic-control
Augmented Traffic Control: A tool to simulate network conditions
Language:Python00
Awesome-Mixture-of-Experts-Papers
A curated reading list of research in Mixture-of-Experts(MoE).
0 0 00
aws-cv-task2vec
Official code for the paper "Task2Vec: Task Embedding for Meta-Learning" (https://arxiv.org/abs/1902.03545, ICCV 2019)
Language:Python0 0 00
Bone-point
0 1 00
Cloud-Gaming-Video-Dataset
0 0 00
CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
Language:Cuda0 0 00
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++00
cutlass-kernels
Library of CUTLASS kernels targeting Large Language Models (LLM).
Language:Cuda0 0 00
distributed-llama
Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
Language:C++0 0 00
Vehicle-recognition
Vehicle recognition
Language:Python31

JiangShanCode's Repositories

JiangShanCode/Vehicle-recognition
Vehicle recognition
Language:Python31
JiangShanCode/augmented-traffic-control
Augmented Traffic Control: A tool to simulate network conditions
Language:Python00
JiangShanCode/Awesome-Mixture-of-Experts-Papers
A curated reading list of research in Mixture-of-Experts(MoE).
0 0 00
JiangShanCode/aws-cv-task2vec
Official code for the paper "Task2Vec: Task Embedding for Meta-Learning" (https://arxiv.org/abs/1902.03545, ICCV 2019)
Language:Python0 0 00
JiangShanCode/Bone-point
0 1 00
JiangShanCode/Cloud-Gaming-Video-Dataset
0 0 00
JiangShanCode/CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
Language:Cuda0 0 00
JiangShanCode/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++00
JiangShanCode/cutlass-kernels
Library of CUTLASS kernels targeting Large Language Models (LLM).
Language:Cuda0 0 00
JiangShanCode/distributed-llama
Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
Language:C++0 0 00
JiangShanCode/JiangShanCode
MyBlog
Language:JavaScript00
JiangShanCode/MyBlogImages
MyBlogImages
0 1 00
JiangShanCode/nanoPyC
Language:Python0 0 00
JiangShanCode/SimSiam
Pytorch implementation of the paper Exploring Simple Siamese Representation Learning.
Language:Python0 0 00
JiangShanCode/fp6_llm
An efficient GPU support for LLM inference with 6-bit quantization (FP6).
JiangShanCode/Grace
GRACE: Loss-Resilient Real-Time Video through Neural Codecs (https://www.usenix.org/system/files/nsdi24-cheng.pdf)
Language:Python0 0
JiangShanCode/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
JiangShanCode/llama.cpp
LLM inference in C/C++
Language:C++0 0
JiangShanCode/llm.c
LLM training in simple, raw C/CUDA
JiangShanCode/ncnn-examples
C program of NCNN
Language:C++1 0
JiangShanCode/NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
Language:Python0 0
JiangShanCode/nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
JiangShanCode/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
JiangShanCode/triton
Development repository for the Triton language and compiler
Language:Python0 0
JiangShanCode/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python
JiangShanCode/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Language:C++0 0