kevinintel

Pinned Repositories

intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Language:Python2.1k 28 166211
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language:Python2.3k 33 209258
neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Language:C++349 8 4738
asm_test
Language:Jupyter Notebook00
changelog
naive_changelog_tool
Language:Python0 1 00
neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
Language:Python0 0 00
docs
This repo contains documents of the OPEA project
Language:Python28 16 5757
GenAIComps
GenAI components at micro-service level; GenAI service composer to create mega-service
Language:Python84 19 232144
GenAIExamples
Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.
Language:Shell300 24 329198
GenAIInfra
Containerization and cloud native suite for OPEA
Language:Go33 18 21161

kevinintel's Repositories

kevinintel/asm_test
Language:Jupyter Notebook00
kevinintel/changelog
naive_changelog_tool
Language:Python0 1 00
kevinintel/neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
Language:Python0 0 00