Pinned Repositories
ai_models
ArchMeasureBench
BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
CLBlast
Tuned OpenCL BLAS
flexible-gemm
flexible-gemm conv of deepcore
hpc_dev_docs
miCore
vim-setup
WorkTips
xingjinglu's Repositories
xingjinglu/WorkTips
xingjinglu/MNN
MNN is a lightweight deep neural network inference engine.
xingjinglu/vim-setup
xingjinglu/Awesome-GPU
Awesome resources for GPUs
xingjinglu/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
xingjinglu/baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
xingjinglu/ComfyUI
The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
xingjinglu/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
xingjinglu/inference
Reference implementations of MLPerf™ inference benchmarks
xingjinglu/ios-cmake
A CMake toolchain file for iOS, macOS, watchOS & tvOS C/C++/Obj-C++ development
xingjinglu/KsanaLLM
xingjinglu/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
xingjinglu/llama
Inference code for LLaMA models
xingjinglu/LLM-FineTuning-Large-Language-Models
LLM (Large Language Model) FineTuning
xingjinglu/mnn-llm
llm deploy project based mnn.
xingjinglu/mupdf
mupdf mirror
xingjinglu/netron
Visualizer for neural network, deep learning, and machine learning models
xingjinglu/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
xingjinglu/opencv-mobile
The minimal opencv for Android, iOS, ARM Linux, Windows, Linux, MacOS, WebAssembly
xingjinglu/prompt-cache
Modular and structured prompt caching for low-latency LLM inference
xingjinglu/PTX-Samples
Reproducers for various PTX related issues
xingjinglu/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
xingjinglu/sgl-learning-materials
Materials for learning SGLang
xingjinglu/sglang
SGLang is yet another fast serving framework for large language models and vision language models.
xingjinglu/Stable-Diffusion-WebUI-TensorRT
TensorRT Extension for Stable Diffusion Web UI
xingjinglu/streaming-llm
Efficient Streaming Language Models with Attention Sinks
xingjinglu/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
xingjinglu/triton
Development repository for the Triton language and compiler
xingjinglu/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
xingjinglu/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs