Pinned Repositories
.vim
My vim configuration
action-automatic-releases
READONLY: Auto-generated mirror for https://github.com/marvinpinto/actions/tree/master/packages/automatic-releases
agora
an universal log collect system
hey
HTTP load generator, ApacheBench (ab) replacement, formerly known as rakyll/boom
Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Paddle-Lite
Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎)
Serving
A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)
task-schedule
Have implemented a threadpool and task queue for running tasks on multicore CPU
tf_serving_client_brpc
tensorflow serving client using brpc
zhangjun.github.io
https://zhangjun.github.io
zhangjun's Repositories
zhangjun/zhangjun.github.io
https://zhangjun.github.io
zhangjun/ai-chatbot
A full-featured, hackable Next.js AI chatbot built by Vercel
zhangjun/stable_diffusion_compile
compile stable diffusion to run faster
zhangjun/WeChatMsg
提取微信聊天记录,将其导出成HTML、Word、CSV文档永久保存,对聊天记录进行分析生成年度聊天报告
zhangjun/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
zhangjun/llm-inference-benchmark
LLM Inference benchmark
zhangjun/llm-quant
zhangjun/llm-tools
zhangjun/llm_chat
zhangjun/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
zhangjun/my_notes
zhangjun/oneflow-diffusers
OneFlow backend for 🤗 Diffusers and ComfyUI
zhangjun/openai-node
The official Node.js / Typescript library for the OpenAI API
zhangjun/paper-reading
zhangjun/puck
zhangjun/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
zhangjun/sglang
SGLang is a fast serving framework for large language models and vision language models.
zhangjun/stable-diffusion-webui-docker
stable diffusion webui docker
zhangjun/stable-fast
An ultra lightweight inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
zhangjun/StableTriton
The first open source triton inference engine for Stable Diffusion, specifically for sdxl
zhangjun/Taipy-Chatbot-Demo
A template to create any LLM Inference Web Apps using Python only
zhangjun/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
zhangjun/tmp
zhangjun/tmp2
zhangjun/torch-play
zhangjun/torch2trt
An easy to use PyTorch to TensorRT converter
zhangjun/torchtune-example
torchtune, llm
zhangjun/transformer_framework
framework for plug and play of various transformers (vision and nlp) with FSDP
zhangjun/triton
Development repository for the Triton language and compiler
zhangjun/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs