Pinned Repositories
nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
jm-hm-ubuntu
Set up JM (AVC) and HM (HEVC) reference codes in Ubuntu
nervana-distiller
Quick start and examples of Intel Nervana System Distiller
nn_pruning
Prune a model while finetuning or training.
nncf
PyTorch*-based Neural Network Compression Framework for enhanced OpenVINO™ inference
openvino-ubuntu
Set up and run OpenVINO in Docker Ubuntu Environment on Intel CPU with Integrated Graphics
vuiseng9's Repositories
vuiseng9/nncf
PyTorch*-based Neural Network Compression Framework for enhanced OpenVINO™ inference
vuiseng9/bench-softmax
vuiseng9/cats
vuiseng9/data-parallel-CPP
Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben Ashbaugh, James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).
vuiseng9/dejavu-lm
vuiseng9/EAGLE
[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
vuiseng9/ipex
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
vuiseng9/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc
vuiseng9/llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
vuiseng9/lm-evaluation-harness
A framework for few-shot evaluation of language models.
vuiseng9/mlperf-inference
Reference implementations of MLPerf™ inference benchmarks
vuiseng9/mlperf-v3.0-intel
This repository contains the results and code for the MLPerf™ Inference v3.0 benchmark.
vuiseng9/mlperf-v3.1-intel
This repository contains the results and code for the MLPerf™ Inference v3.1 benchmark.
vuiseng9/mm_amx
matmul using AMX instructions
vuiseng9/oneAPI-samples
Samples for Intel® oneAPI Toolkits
vuiseng9/openvino.genai
vuiseng9/optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
vuiseng9/optimum-intel
Accelerate inference of 🤗 Transformers with Intel optimization tools
vuiseng9/ov-llm-tld
vuiseng9/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
vuiseng9/sd-perf
quick script to profile stable diffusion performance
vuiseng9/SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
vuiseng9/Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
vuiseng9/speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
vuiseng9/SqueezeLLM
SqueezeLLM: Dense-and-Sparse Quantization
vuiseng9/torch-custom-linear
custom implementation of linear
vuiseng9/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
vuiseng9/trl
Train transformer language models with reinforcement learning.
vuiseng9/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
vuiseng9/wanda
A simple and effective LLM pruning approach.