Pinned Repositories
lm-evaluation-harness
A framework for few-shot evaluation of language models.
intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
neural-speed
An innovative library for efficient LLM inference via low-bit quantization
intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
llama.cpp
Port of Facebook's LLaMA model in C/C++
neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
intellinjun's Repositories
intellinjun/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
intellinjun/llama.cpp
Port of Facebook's LLaMA model in C/C++
intellinjun/neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
intellinjun/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration