/rknn-llm

Primary LanguagePythonOtherNOASSERTION

Description

RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:

In order to use RKNPU, users need to first run the RKLLM-Toolkit tool on the computer, convert the trained model into an RKLLM format model, and then inference on the development board using the RKLLM C API.

  • RKLLM-Toolkit is a software development kit for users to perform model conversionand quantization on PC.

  • RKLLM Runtime provides C/C++ programming interfaces for Rockchip NPU platform to help users deploy RKLLM models and accelerate the implementation of LLM applications.

  • RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.

Support Platform

  • RK3588 Series
  • RK3576 Series
  • RK3562 Series
  • RV1126B Series

Support Models

Model Performance

  1. Benchmark results of common LLMs.

Performance Testing Methods

  1. Run the frequency-setting script from the scripts directory on the target platform.
  2. Execute export RKLLM_LOG_LEVEL=1 on the device to log model inference performance and memory usage.
  3. Use the eval_perf_watch_cpu.sh script to measure CPU utilization.
  4. Use the eval_perf_watch_npu.sh script to measure NPU utilization.

Download

  1. You can download the latest package from RKLLM_SDK, fetch code: rkllm
  2. You can download the converted rkllm model from rkllm_model_zoo, fetch code: rkllm

Examples

  1. Multimodel deployment demo: multimodal_model_demo
  2. API usage demo: rkllm_api_demo
  3. API server demo: rkllm_server_demo

Note

  • The supported Python versions are:

    • Python 3.8
    • Python 3.9
    • Python 3.10
    • Python 3.11
    • Python 3.12

Note: Before installing package in a Python 3.12 environment, please run the command:

export BUILD_CUDA_EXT=0
  • On some platforms, you may encounter an error indicating that libomp.so cannot be found. To resolve this, locate the library in the corresponding cross-compilation toolchain and place it in the board's lib directory, at the same level as librkllmrt.so.
  • RWKV model conversion only supports Python 3.12. Please use requirements_rwkv7.txt to set up the pip environment.
  • Latest version: v1.2.2

RKNN Toolkit2

If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:

https://github.com/airockchip/rknn-toolkit2

CHANGELOG

v1.2.2

  • Added support for Gemma3n and InternVL3 models
  • Supported for multi-instance inference
  • Supported for LongRoPE
  • Fixed issues with asynchronous inference interfaces
  • Fixed chat template parsing issues
  • Optimized inference performance
  • Optimized multimodal vision model demo

for older version, please refer CHANGELOG