litianjian

Pinned Repositories

Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
00
blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Language:Cuda00
caffe
Caffe: a fast open framework for deep learning.
Language:C++00
caffe-int8-convert-tools
Generate a quantization parameter file for ncnn framework int8 inference
Language:Python00
learnGitBranching
An interactive git visualization to challenge and educate!
Language:JavaScript10
netron
Visualizer for deep learning and machine learning models
Language:JavaScript10
openai-gemm
Open single and half precision gemm implementations
Language:C10
ppl.nn-openppl
A primitive library for neural network
Language:C++10
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:C++1 1 00
Triton_OpenPPL_Backend
Language:C++5 2 22

litianjian's Repositories

litianjian/Triton_OpenPPL_Backend
Language:C++5 2 22
litianjian/learnGitBranching
An interactive git visualization to challenge and educate!
Language:JavaScript10
litianjian/netron
Visualizer for deep learning and machine learning models
Language:JavaScript10
litianjian/ppl.nn-openppl
A primitive library for neural network
Language:C++10
litianjian/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:C++1 1 00
litianjian/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
00
litianjian/blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Language:Cuda00
litianjian/caffe-int8-convert-tools
Generate a quantization parameter file for ncnn framework int8 inference
Language:Python00
litianjian/convnet-burden
Memory consumption and FLOP count estimates for convnets
Language:MATLAB
litianjian/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Language:C++1 0
litianjian/cutlass
CUDA Templates for Linear Algebra Subroutines
litianjian/deepcore_source_code
Subpart source code of of deepcore v0.7
Language:C
litianjian/DeepLearningExamples
Deep Learning Examples
Language:Python
litianjian/DistServe
Disaggregated serving system for Large Language Models (LLMs).
litianjian/Dive-into-DL-PyTorch
本项目将《动手学深度学习》原书中的MXNet代码实现改为PyTorch实现。
litianjian/dl_note
深度学习系统笔记，包含深度学习数学基础知识、神经网络基础部件详解、深度学习炼丹策略、模型压缩算法详解，以及如何实现深度学习推理框架实战。
litianjian/gpgpu-sim_simulations
A repository that compliments gpgpu-sim, providing automated regression scripts, simulation launching utilities and the code + arguments for simulations that complete in a reasonable amount of time on GPGPU-Sim.
Language:Cuda1 0
litianjian/isaac
Automatically-Tuned Input-Aware implementations of HPC/DNN primitives
Language:C++
litianjian/jetson_benchmarks
Jetson Benchmark
Language:Python1 0
litianjian/LeetCodeAnimation
Demonstrate all the questions on LeetCode in the form of animation.（用动画的形式呈现解LeetCode题目的思路）
Language:C++
litianjian/LLaVA-NeXT
Language:Python
litianjian/lunana
Language:JavaScript
litianjian/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
litianjian/MVision
机器人视觉移动机器人 VS-SLAM ORB-SLAM2 深度学习目标检测 yolov3 行为检测 opencv PCL 机器学习无人驾驶
Language:C++
litianjian/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Language:C++
litianjian/Play-Leetcode
My Solutions to Leetcode problems. All solutions support C++ language, some support Java and Python. Multiple solutions will be given by most problems. Enjoy:) 我的Leetcode解答。所有的问题都支持C++语言，一部分问题支持Java语言。近乎所有问题都会提供多个算法解决。大家加油！：）
Language:C++
litianjian/STL
MSVC's implementation of the C++ Standard Library.
Language:C++1 0
litianjian/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
litianjian/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python
litianjian/wgtcc
A small C11 compiler in C++11
Language:C++1 0