Pinned Repositories
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
caffe
Caffe: a fast open framework for deep learning.
caffe-int8-convert-tools
Generate a quantization parameter file for ncnn framework int8 inference
learnGitBranching
An interactive git visualization to challenge and educate!
netron
Visualizer for deep learning and machine learning models
openai-gemm
Open single and half precision gemm implementations
ppl.nn-openppl
A primitive library for neural network
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Triton_OpenPPL_Backend
litianjian's Repositories
litianjian/Triton_OpenPPL_Backend
litianjian/learnGitBranching
An interactive git visualization to challenge and educate!
litianjian/netron
Visualizer for deep learning and machine learning models
litianjian/ppl.nn-openppl
A primitive library for neural network
litianjian/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
litianjian/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
litianjian/blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
litianjian/caffe-int8-convert-tools
Generate a quantization parameter file for ncnn framework int8 inference
litianjian/convnet-burden
Memory consumption and FLOP count estimates for convnets
litianjian/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
litianjian/cutlass
CUDA Templates for Linear Algebra Subroutines
litianjian/deepcore_source_code
Subpart source code of of deepcore v0.7
litianjian/DeepLearningExamples
Deep Learning Examples
litianjian/DistServe
Disaggregated serving system for Large Language Models (LLMs).
litianjian/Dive-into-DL-PyTorch
本项目将《动手学深度学习》原书中的MXNet代码实现改为PyTorch实现。
litianjian/dl_note
深度学习系统笔记,包含深度学习数学基础知识、神经网络基础部件详解、深度学习炼丹策略、模型压缩算法详解,以及如何实现深度学习推理框架实战。
litianjian/gpgpu-sim_simulations
A repository that compliments gpgpu-sim, providing automated regression scripts, simulation launching utilities and the code + arguments for simulations that complete in a reasonable amount of time on GPGPU-Sim.
litianjian/isaac
Automatically-Tuned Input-Aware implementations of HPC/DNN primitives
litianjian/jetson_benchmarks
Jetson Benchmark
litianjian/LeetCodeAnimation
Demonstrate all the questions on LeetCode in the form of animation.(用动画的形式呈现解LeetCode题目的思路)
litianjian/LLaVA-NeXT
litianjian/lunana
litianjian/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
litianjian/MVision
机器人视觉 移动机器人 VS-SLAM ORB-SLAM2 深度学习目标检测 yolov3 行为检测 opencv PCL 机器学习 无人驾驶
litianjian/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
litianjian/Play-Leetcode
My Solutions to Leetcode problems. All solutions support C++ language, some support Java and Python. Multiple solutions will be given by most problems. Enjoy:) 我的Leetcode解答。所有的问题都支持C++语言,一部分问题支持Java语言。近乎所有问题都会提供多个算法解决。大家加油!:)
litianjian/STL
MSVC's implementation of the C++ Standard Library.
litianjian/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
litianjian/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
litianjian/wgtcc
A small C11 compiler in C++11