Pinned Repositories
ollama
Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.
auto-round
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
bitsandbytes
8-bit CUDA functions for PyTorch
data-parallel-CPP
Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).
intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
neural-speed
ollama
Get up and running with Llama 2, Mistral, and other large language models locally.
oneDNN
oneAPI Deep Neural Network Library (oneDNN)
zhewang1-intc's Repositories
zhewang1-intc/ollama
Get up and running with Llama 2, Mistral, and other large language models locally.
zhewang1-intc/auto-round
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
zhewang1-intc/bitsandbytes
8-bit CUDA functions for PyTorch
zhewang1-intc/data-parallel-CPP
Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).
zhewang1-intc/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
zhewang1-intc/neural-speed
zhewang1-intc/oneDNN
oneAPI Deep Neural Network Library (oneDNN)