miaojinc

Pinned Repositories

AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.4k 30 454467
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Language:Python25.4k 198 4.1k5.3k
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python11
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
Language:Python00
GenAIComps
GenAI components at micro-service level; GenAI service composer to create mega-service
Language:Python00
llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python01
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
Language:Python00
parlooper
PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolutions and Fused Deep Learning Primitives
Language:C++00
xFasterTransformer
Language:C++0 0 00
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
Language:Python747 12 9851

miaojinc's Repositories

miaojinc/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python11
miaojinc/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
Language:Python00
miaojinc/GenAIComps
GenAI components at micro-service level; GenAI service composer to create mega-service
Language:Python00
miaojinc/llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python01
miaojinc/mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
Language:Python00
miaojinc/parlooper
PArallelLOOPgEneratoR: Threaded Loops Code Generation Infrastructure targeting Tensor Contraction Applications such as GEMMs, Convolutions and Fused Deep Learning Primitives
Language:C++00
miaojinc/xFasterTransformer
Language:C++0 0 00