int8

There are 33 repositories under int8 topic.

intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Language:Python2.3k 33 211259
intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
Language:C++352 8 4738
clancylian/retinaface
Reimplement RetinaFace use C++ and TensorRT
Language:C++297 8 3288
Wulingtian/yolov5_tensorrt_int8_tools
tensorrt int8 量化yolov5 onnx模型
Language:Python180 2 1442
Wulingtian/yolov5_tensorrt_int8
TensorRT int8 量化部署 yolov5s 模型，实测3.3ms一帧！
Language:C++167 3 1126
Wulingtian/RepVGG_TensorRT_int8
RepVGG TensorRT int8 量化，实测推理不到1ms一帧！
Language:Python61 2 815
xuanandsix/Tensorrt-int8-quantization-pipline
a simple pipline of int8 quantization based on tensorrt.
Language:Python58 1 23
the0807/YOLOv8-ONNX-TensorRT
👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera
Language:Python43 1 23
Wulingtian/nanodet_tensorrt_int8
nanodet int8 量化，实测推理2ms一帧！
Language:C++37 2 67
ppogg/ncnn-yolov4-int8
NCNN+Int8+YOLOv4 quantitative modeling and real-time inference
Language:C++24 1 45
whitelok/tensorrt-int8-python-sample
TensorRT Int8 Python version sample. TensorRT Int8 Python 实现例子。TensorRT Int8 Pythonの例です
Language:Python14 3 21
aahouzi/llama2-chatbot-cpu
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
Language:Python13 2 00
Egorundel/int8_calibrator_cpp
INT8 calibrator for ONNX model with dynamic batch_size at the input and NMS module at the output. C++ Implementation.
Language:C++11 1 00
cbalint13/rvv-kernels
RISCV Vector Kernel C/LLVM-IR generator
Language:C7 2 01
dasdristanta13/LLM-Lora-PEFT_accumulate
LLM-Lora-PEFT_accumulate explores optimizations for Large Language Models (LLMs) using PEFT, LORA, and QLORA. Contribute experiments and implementations to enhance LLM efficiency. Join discussions and push the boundaries of LLM optimization. Let's make LLMs more efficient together!
Language:Jupyter Notebook6 2 11
egbertYeah/mt-yolov6_tensorrt
MT-Yolov6 TensorRT Inference with Python.
Language:Python6 1 30
yester31/Quantization_EX
quantization example for pqt & qat
Language:Python6 1 02
JohnClaw/chatllm.vb
VB.NET api wrapper for llm-inference chatllm.cpp
Language:Visual Basic .NET4 1 00
JohnClaw/chatllm.cs
C# api wrapper for llm-inference chatllm.cpp
Language:C#3 1 00
stdlib-js/array-int8
Int8Array.
Language:JavaScript2 3 0
stdlib-js/constants-int8
8-bit signed integer mathematical constants.
Language:JavaScript2 3 0
stdlib-js/constants-int8-min
Minimum signed 8-bit integer.
Language:JavaScript2 3 0
lbin/gie_int8_sample
Language:C++1 5 00
RyannnG/gie_int8_sample
Language:C++1 2 01
stdlib-js/assert-is-int8array
Test if a value is an Int8Array.
Language:JavaScript1 3 0
stdlib-js/constants-int8-max
Maximum signed 8-bit integer.
Language:JavaScript1 3 0
stdlib-js/constants-int8-num-bytes
Size (in bytes) of an 8-bit signed integer.
Language:JavaScript1 3 0
stdlib-js/napi-argv-int8array
Convert a Node-API value to a signed 8-bit integer array.
Language:C1 3 0
stdlib-js/napi-argv-strided-int8array
Convert a Node-API value representing a strided array to a signed 8-bit integer array.
Language:C1 3 0
stdlib-js/napi-argv-strided-int8array2d
Convert a Node-API value representing a two-dimensional strided array to a signed 8-bit integer array.
Language:C1 2 0
douzsh/mxnet-quantized
mxnet GluonCV quantization binary ternary models
Language:Python0 2 00
MrFMach/Practice-C-types
Practicing C data types using the sizeof function
Language:C1 0
yester31/Quantization_Framework
development quantization framework
Language:Python1 1

int8

intel/neural-compressor

intel/neural-speed

clancylian/retinaface

Wulingtian/yolov5_tensorrt_int8_tools

Wulingtian/yolov5_tensorrt_int8

Wulingtian/RepVGG_TensorRT_int8

xuanandsix/Tensorrt-int8-quantization-pipline

the0807/YOLOv8-ONNX-TensorRT

Wulingtian/nanodet_tensorrt_int8

ppogg/ncnn-yolov4-int8

whitelok/tensorrt-int8-python-sample

aahouzi/llama2-chatbot-cpu

Egorundel/int8_calibrator_cpp

cbalint13/rvv-kernels

dasdristanta13/LLM-Lora-PEFT_accumulate

egbertYeah/mt-yolov6_tensorrt

yester31/Quantization_EX

JohnClaw/chatllm.vb

JohnClaw/chatllm.cs

stdlib-js/array-int8

stdlib-js/constants-int8

stdlib-js/constants-int8-min

lbin/gie_int8_sample

RyannnG/gie_int8_sample

stdlib-js/assert-is-int8array

stdlib-js/constants-int8-max

stdlib-js/constants-int8-num-bytes

stdlib-js/napi-argv-int8array

stdlib-js/napi-argv-strided-int8array

stdlib-js/napi-argv-strided-int8array2d

douzsh/mxnet-quantized

MrFMach/Practice-C-types

yester31/Quantization_Framework