[mpi_dec_test] RGB output performance
Closed this issue · 6 comments
Hi, I try to use mpi_dec_test.c in order to get RGB output (NV12->RGB).
I am using single-board computer OrangePi4 with Rockhip RK3399 SoC. Orange's Ubuntu Bionic Image v1.3. Linux kernel is 4.4.179.
My test video stream is h264 3072x2048 yuvj420p.
The decoding itself performs really fast, but post-processing kill all benefits.
It takes about 60 ms to copy data from the buffer to RAM.
void my_dump_mpp_frame_to_file(MppFrame frame, unsigned char *target)
{
RK_U32 width = 0;
RK_U32 height = 0;
RK_U32 h_stride = 0;
RK_U32 v_stride = 0;
MppBuffer buffer = NULL;
RK_U8 *base = NULL;
if (NULL == target|| NULL == frame)
return ;
width = mpp_frame_get_width(frame);
height = mpp_frame_get_height(frame);
h_stride = mpp_frame_get_hor_stride(frame);
v_stride = mpp_frame_get_ver_stride(frame);
buffer = mpp_frame_get_buffer(frame);
if (NULL == buffer)
return ;
base = (RK_U8 *)mpp_buffer_get_ptr(buffer );
RK_U8 *base_y = base;
RK_U8 *base_c = base + h_stride * v_stride;
memcpy(target, base, width*height+width*(height/2)); // this memcpy take ~60ms
}
...
unsigned char yv12DataBuffer[3072*1024*3]; //global var
...
// NV12-> RGB convertation
auto nWidth = cmd->width;
auto nHeight = cmd->height;
cv::Mat picYV12 = cv::Mat(nHeight * 3/2, nWidth, CV_8UC1, yv12DataBuffer);
cv::Mat picBGR;
cv::cvtColor(picYV12, picBGR, cv::COLOR_YUV2RGB_NV21 );
What is the best way to get RGB image after decoding?
Should I use a different buffer mode (external) to improve performance?
- Try use cachable hardware memory and it will be faster.
- Use RGA to do yuv to RGB conversion it will be even more faster.
Do not use cpu to access the pixel data.
Could you please explain more about cacheable hardware memory? How is it different from the regular memory? I would greatly appreciate any examples.
The memory for encoder/decoder hardware is not normal malloc memory. It is dma-buf in fact and provide by kernel through ion (on Android) or drm (on Linux). The allocator (ion or drm) can make the memory cacheable or non-cache for CPU.
https://www.kernel.org/doc/html/v4.14/driver-api/dma-buf.html
I tried to run rga_test.cpp , but I got the error:
sudo ./rga_test -i input.raw -o out.raw -w 1920 -h 1080 -f 0 -dst_w 1920 -dst_h 1080 -dst_fmt 0
...
mpp[7242]: mpp_log: rga ioctl failed errno:25 Inappropriate ioctl for device
rga.cpp
#define DEFAULT_RGA_DEV "/dev/video0"
sudo v4l2-ctl --list-devices
rockchip,rk3399-vpu-enc (platform: hantro-vpu):
/dev/video1
/dev/video2
rockchip-rga (platform:rga):
/dev/video0
rkvdec (platform:rkvdec):
/dev/video3
I also tried to use this code: https://github.com/McAronDev/RK3188_colorspace_convert
But the result is the same : RGA_BLIT_SYNC Failed (on this line ioctl(fd, RGA_BLIT_SYNC, &Rga_Request))
What could be a problem?
It also worth mentioning that original Orange Pi 4 firmware has RGA node disabled in the devicetree and we couldn’t find easy way to enable it. That’s why we took Armbian’s U-Boot and Linux 5.8.6 and mixed it with Orange’s userland Ubuntu 18.04 where we have developed our software so far.
So after some research I discovered that video decoders moved to video4linux subsystem in kernel 5.8. So we have to use standard APIs to work with hardware codecs. For now we could properly use kernel 5.8.6 and ffmpeg with "hwaccel drm and v4l2-request" patches.