ncnn-vulkan

This fork aims to provide vulkan functionality to the ncnn python bindings, which is absent in the official package. All that is needed to use vulkan is the addition of net.opt.use_vulkan_compute = True.

Also in this fork, failed vram allocation will result in a runtime error that is catchable, allowing you to clear the allocated vram on this error by using ncnn.destroy_gpu_instance().

Minimal example:

import cv2
from ncnn_vulkan import ncnn
import numpy as np

net = ncnn.Net()

# Use vulkan compute
net.opt.use_vulkan_compute = True

# Load model param and bin
net.load_param("./x4.param")
net.load_model("./x4.bin")

ex = net.create_extractor()

# Load image using opencv
img = cv2.imread("./example.jpg")

# Convert image to ncnn Mat
mat_in = ncnn.Mat.from_pixels(
    img,
    ncnn.Mat.PixelType.PIXEL_BGR,
    img.shape[1],
    img.shape[0]
)

# Normalize image (required)
# Note that passing in a normalized numpy array will not work.
mean_vals = []
norm_vals = [1 / 255.0, 1 / 255.0, 1 / 255.0]
mat_in.substract_mean_normalize(mean_vals, norm_vals)

# Try/except block to catch out-of-memory error
try:
    # Make sure the input and output names match the param file
    ex.input("data", mat_in)
    ret, mat_out = ex.extract("output")
    out = np.array(mat_out)

    # Transpose the output from `c, h, w` to `h, w, c` and put it back in 0-255 range
    output = out.transpose(1, 2, 0) * 255

    # Save image using opencv
    cv2.imwrite('./out.png', output)
except:
    ncnn.destroy_gpu_instance()

original readme

ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployment and uses on mobile phones from the beginning of design. ncnn does not have third party dependencies. It is cross-platform, and runs faster than all known open source frameworks on mobile phone cpu. Developers can easily deploy deep learning algorithm models to the mobile platform by using efficient ncnn implementation, create intelligent APPs, and bring the artificial intelligence to your fingertips. ncnn is currently being used in many Tencent applications, such as QQ, Qzone, WeChat, Pitu and so on.

ncnn 是一个为手机端极致优化的高性能神经网络前向计算框架。ncnn 从设计之初深刻考虑手机端的部署和使用。无第三方依赖，跨平台，手机端 cpu 的速度快于目前所有已知的开源框架。基于 ncnn，开发者能够将深度学习算法轻松移植到手机端高效执行，开发出人工智能 APP，将 AI 带到你的指尖。ncnn 目前已在腾讯多款应用中使用，如 QQ，Qzone，微信，天天P图等。

技术交流QQ群：637093648(超多大佬) 答案：卷卷卷卷卷（已满）

Pocky QQ群（MLIR YES!）: 677104663(超多大佬) 答案：multi-level intermediate representation

Telegram Group https://t.me/ncnnyes

Discord Channel https://discord.gg/YRsxgmF

Current building status matrix

System	CPU (32bit)	CPU (64bit)	GPU (32bit)	GPU (64bit)
Linux (GCC)			—
Linux (Clang)			—
Linux (ARM)			—	—
Linux (MIPS)			—	—
Linux (RISC-V)	—		—	—
Linux (LoongArch)	—		—	—
Windows			—
Windows (ARM)			—	—
macOS	—		—
macOS (ARM)	—		—
Android
Android-x86
iOS			—
iOS Simulator			—	—
WebAssembly		—	—	—
RISC-V GCC/Newlib			—	—

Support most commonly used CNN network

支持大部分常用的 CNN 网络

Classical CNN: VGG AlexNet GoogleNet Inception ...
Practical CNN: ResNet DenseNet SENet FPN ...
Light-weight CNN: SqueezeNet MobileNetV1/V2/V3 ShuffleNetV1/V2 MNasNet ...
Face Detection: MTCNN RetinaFace scrfd ...
Detection: VGG-SSD MobileNet-SSD SqueezeNet-SSD MobileNetV2-SSDLite MobileNetV3-SSDLite ...
Detection: Faster-RCNN R-FCN ...
Detection: YOLOv2 YOLOv3 MobileNet-YOLOv3 YOLOv4 YOLOv5 YOLOv7 YOLOX ...
Detection: NanoDet
Segmentation: FCN PSPNet UNet YOLACT ...
Pose Estimation: SimplePose ...

HowTo

how to build ncnn library on Linux / Windows / macOS / Raspberry Pi3 / Android / NVIDIA Jetson / iOS / WebAssembly / AllWinner D1 / Loongson 2K1000

download prebuild binary package for android and ios

use ncnn with alexnet with detailed steps, recommended for beginners :)

ncnn 组件使用指北 alexnet 附带详细步骤，新人强烈推荐 :)

use netron for ncnn model visualization

out-of-the-box web model conversion

ncnn low-level operation api

ncnn param and model file spec

ncnn operation param weight table

how to implement custom layer step by step

FAQ

ncnn throw error

ncnn produce wrong result

ncnn vulkan

Features

Supports convolutional neural networks, supports multiple input and multi-branch structure, can calculate part of the branch
No third-party library dependencies, does not rely on BLAS / NNPACK or any other computing framework
Pure C++ implementation, cross-platform, supports android, ios and so on
ARM NEON assembly level of careful optimization, calculation speed is extremely high
Sophisticated memory management and data structure design, very low memory footprint
Supports multi-core parallel computing acceleration, ARM big.LITTLE cpu scheduling optimization
Supports GPU acceleration via the next-generation low-overhead vulkan api
Extensible model design, supports 8bit quantization and half-precision floating point storage, can import caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) models
Support direct memory zero copy reference load network model
Can be registered with custom layer implementation and extended
Well, it is strong, not afraid of being stuffed with 卷 QvQ

功能概述

支持卷积神经网络，支持多输入和多分支结构，可计算部分分支
无任何第三方库依赖，不依赖 BLAS/NNPACK 等计算框架
纯 C++ 实现，跨平台，支持 android ios 等
ARM NEON 汇编级良心优化，计算速度极快
精细的内存管理和数据结构设计，内存占用极低
支持多核并行计算加速，ARM big.LITTLE cpu 调度优化
支持基于全新低消耗的 vulkan api GPU 加速
可扩展的模型设计，支持 8bit 量化和半精度浮点存储，可导入 caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) 模型
支持直接内存零拷贝引用加载网络模型
可注册自定义层实现并扩展
恩，很强就是了，不怕被塞卷 QvQ

supported platform matrix

✅ = known work and runs fast with good optimization
✔️ = known work, but speed may not be fast enough
❔ = shall work, not confirmed
/ = not applied

	Windows	Linux	Android	macOS	iOS
intel-cpu	✔️	✔️	❔	✔️	/
intel-gpu	✔️	✔️	❔	❔	/
amd-cpu	✔️	✔️	❔	✔️	/
amd-gpu	✔️	✔️	❔	❔	/
nvidia-gpu	✔️	✔️	❔	❔	/
qcom-cpu	❔	✔️	✅	/	/
qcom-gpu	❔	✔️	✔️	/	/
arm-cpu	❔	❔	✅	/	/
arm-gpu	❔	❔	✔️	/	/
apple-cpu	/	/	/	✔️	✅
apple-gpu	/	/	/	✔️	✔️