Pinned Repositories
intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
neural-speed
An innovative library for efficient LLM inference via low-bit quantization
config-files
config files in home dir
inference_results_v1.0
MLPerf™ v1.0 results
intel-extension-for-transformers
Extending Hugging Face transformers APIs for Transformer-based models and improve the productivity of inference deployment. With extremely compressed models, the toolkit can greatly improve the inference efficiency on Intel platforms.
onnx
Open standard for machine learning interoperability
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
zhentaoyu's Repositories
zhentaoyu/config-files
config files in home dir
zhentaoyu/inference_results_v1.0
MLPerf™ v1.0 results
zhentaoyu/intel-extension-for-transformers
Extending Hugging Face transformers APIs for Transformer-based models and improve the productivity of inference deployment. With extremely compressed models, the toolkit can greatly improve the inference efficiency on Intel platforms.
zhentaoyu/onnx
Open standard for machine learning interoperability
zhentaoyu/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration