Pinned Repositories
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
GenAIComps
GenAI components at micro-service level; GenAI service composer to create mega-service
llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
parlooper
PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolutions and Fused Deep Learning Primitives
xFasterTransformer
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
miaojinc's Repositories
miaojinc/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
miaojinc/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
miaojinc/GenAIComps
GenAI components at micro-service level; GenAI service composer to create mega-service
miaojinc/llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
miaojinc/mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
miaojinc/parlooper
PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolutions and Fused Deep Learning Primitives
miaojinc/xFasterTransformer