Pinned Repositories
neural-speed
An innovative library for efficient LLM inference via low-bit quantization
xetla
openvino.genai
Run Generative AI models with simple C++/Python API and using OpenVINO Runtime
neural-speed
An innovation library for efficient LLM inference via low-bit quantization and sparsity
xetla
parvizmp's Repositories
parvizmp/neural-speed
An innovation library for efficient LLM inference via low-bit quantization and sparsity
parvizmp/xetla