laugh12321/TensorRT-YOLO

[Feature]: Support streaming video?

Opened this issue · 12 comments

[Feature]: Support streaming video?

The project currently supports video stream inference, but a concrete implementation hasn't been provided yet. I'll address this in the future. Meanwhile, you can refer to the example provided at #15 (comment).

The project currently supports video stream inference, but a concrete implementation hasn't been provided yet. I'll address this in the future. Meanwhile, you can refer to the example provided at #15 (comment).

it is better to use pyside6. cv2 drops 50% of fps.
OpenCV is not prepared for videos:
if cv2.waitKey(1) & 0xFF == ord("q") drops 50% of fps and more in windows.
https://forum.opencv.org/t/fps-drop-due-to-the-waitkey-function/3717
https://stackoverflow.com/questions/60543088/capturing-fps-drops-with-lower-resolution-opencv
https://forum.opencv.org/t/confusing-waitkey-buffering-behavior/10058
https://answers.opencv.org/question/52774/waitkey1-timing-issues-causing-frame-rate-slow-down-fix/
https://dsp.stackexchange.com/questions/44129/does-the-cv2-waitkey-argument-determine-the-frame-rate-while-capturing-a-video

Also, waitkey It dependes of scheduling internally of the SO.
https://pypi.org/project/PyQt6/ or https://github.com/fastplotlib/fastplotlib it could be a great solution.
Why? It uses pygfx( vulkan or metal or directx12) for rendering. Because use vulkan-python is more complicated.

Example: if you mantain press a key of keyboards, inference will be faster
Also for video and cameras streaming you can read in a new frame and give more support devices with vidgear.
https://github.com/abhiTronix/vidgear
https://github.com/abhiTronix/deffcode
https://www.pythonguis.com/faq/pyqt6-vs-pyside6/
or use deepstream:
https://github.com/NVIDIA-AI-IOT/deepstream_python_apps

Thank you for your suggestions and the provided resource links. I will consider adopting PySide6 in future implementations and look into optimizing the performance of video stream inference. The resources you mentioned, such as vidgear and deepstream, seem very helpful, and I'll explore their functionalities and usage. Thanks again for your input!

Thank you for your suggestions and the provided resource links. I will consider adopting PySide6 in future implementations and look into optimizing the performance of video stream inference. The resources you mentioned, such as vidgear and deepstream, seem very helpful, and I'll explore their functionalities and usage. Thanks again for your input!
Vidgear use CPU for I/O because it use opencv that is not good, but implements multithreading and queus.
Deffcode it is using ffmpeg backend that if you compile ffmpeg with hw encoders/decoders you should use it.
gpus: https://developer.nvidia.com/video-codec-sdk this is not work with edge devices.

I have some implementations. But now I'm investigating to use nvenc and nvdec with Jetson AGX Orin. When the pipeline is fast, the bottleneck is the input file(video, streaming) and coding the video, so it's necessary to use hardware accelerators.
For gpus you can use:
https://github.com/NVIDIA/DALI
https://github.com/NVIDIA/Deep-Learning-Accelerator-SW
https://github.com/marcoslucianops/DeepStream-Yolo DeepStream SDK
important for agx: https://forums.developer.nvidia.com/t/is-there-any-example-how-to-use-nvenc-nvdec-in-python/291182/6

People usually use Gstream because it has C++ and Python libraries with encoders for HW Accelerators.

also for efficiency, you can use: DLA Hardware for run yolo https://medium.com/@DeeperAndCheaper/jetson-running-yolov8-classfication-model-in-dla-1-why-dla-6a2d2860ebdd

Thank you for sharing the resource link. My initial plan was to utilize gstream for streaming video inference.

Thank you for sharing the resource link. My initial plan was to utilize gstream for streaming video inference.

你好,我想请教一下,在使用RTX系列GPU的电脑上部署yolo视频流应用的较佳实践是什么?

我之前在RK3588嵌入式设备上完成了这样一个yolov8应用:

使用zlmediakit拉rtsp流/使用libuvc拉UVC流 -> 使用RK3588的mpp进行视频硬解码 -> 使用RK3588的RGA加速图像前处理 -> 使用RKNN在RK3588的npu上进行推理 -> 后处理 -> 使用mpp进行视频硬编码 ->使用zlmediakit推rtsp流(画框结果)。

现在想在使用RTX系列GPU的电脑上实现类似的应用,但是之前只有ultralytics框架的使用经验。
了解到使用TensorRT可以在RTX系列GPU上进行加速,这点是确认的。
疑问是对于视频流的处理,从这个issue受到启发,我查阅了DeepStream的相关资料。

DeepStream
从NVIDIA官方描述来看,DeepStream是基于GStream的,对视频流输入输出、视频编解码这些均有支持,并且有内存零拷贝特性。这样看来DeepStream应当是很有优势的,但是目前只能搜索到在Jetson边缘设备上的应用案例,似乎没有人在电脑上运用DeepStream。请问你是否了解这是为什么,是否是因为DeepStream还远不如宣称的那样易于使用?

Thank you for sharing the resource link. My initial plan was to utilize gstream for streaming video inference.

你好,我想请教一下,在使用RTX系列GPU的电脑上部署yolo视频流应用的较佳实践是什么?

我之前在RK3588嵌入式设备上完成了这样一个yolov8应用:

使用zlmediakit拉rtsp流/使用libuvc拉UVC流 -> 使用RK3588的mpp进行视频硬解码 -> 使用RK3588的RGA加速图像前处理 -> 使用RKNN在RK3588的npu上进行推理 -> 后处理 -> 使用mpp进行视频硬编码 ->使用zlmediakit推rtsp流(画框结果)。

现在想在使用RTX系列GPU的电脑上实现类似的应用,但是之前只有ultralytics框架的使用经验。 了解到使用TensorRT可以在RTX系列GPU上进行加速,这点是确认的。 疑问是对于视频流的处理,从这个issue受到启发,我查阅了DeepStream的相关资料。

DeepStream 从NVIDIA官方描述来看,DeepStream是基于GStream的,对视频流输入输出、视频编解码这些均有支持,并且有内存零拷贝特性。这样看来DeepStream应当是很有优势的,但是目前只能搜索到在Jetson边缘设备上的应用案例,似乎没有人在电脑上运用DeepStream。请问你是否了解这是为什么,是否是因为DeepStream还远不如宣称的那样易于使用?

Yes, deepstream is a way, but for me it is a bit tricky to use. For this you can use gstreamer with the ENC/DEC HW provided in the RTX (https://developer.nvidia.com/video-codec-sdk).

Thank you for sharing the resource link. My initial plan was to utilize gstream for streaming video inference.

你好,我想请教一下,在使用RTX系列GPU的电脑上部署yolo视频流应用的较佳实践是什么?

我之前在RK3588嵌入式设备上完成了这样一个yolov8应用:


使用zlmediakit拉rtsp流/使用libuvc拉UVC流 -> 使用RK3588的mpp进行视频硬解码 -> 使用RK3588的RGA加速图像前处理 -> 使用RKNN在RK3588的npu上进行推理 -> 后处理 -> 使用mpp进行视频硬编码 ->使用zlmediakit推rtsp流(画框结果)。

现在想在使用RTX系列GPU的电脑上实现类似的应用,但是之前只有ultralytics框架的使用经验。

了解到使用TensorRT可以在RTX系列GPU上进行加速,这点是确认的。

疑问是对于视频流的处理,从这个issue受到启发,我查阅了DeepStream的相关资料。

DeepStream

从NVIDIA官方描述来看,DeepStream是基于GStream的,对视频流输入输出、视频编解码这些均有支持,并且有内存零拷贝特性。这样看来DeepStream应当是很有优势的,但是目前只能搜索到在Jetson边缘设备上的应用案例,似乎没有人在电脑上运用DeepStream。请问你是否了解这是为什么,是否是因为DeepStream还远不如宣称的那样易于使用?

最佳实践当然还是用Gstreamer,但是虽然Gstreamer支持在Windows使用,但插件开发对Windows系统不友好。我目前正在开发基于Gstreamer的跨平台视频推理示例。

DeepStream只支持Linux系统,不支持Windows系统。诸如零拷贝等特性需要特定的设备支持,比如Jetson。

了解了,非常感谢两位的回复 @johnnynunez @laugh12321

Exactly, here is an explanation of zero-copy. Here's why using ultralytics natively is not good for embedded systems.
There is one way, and that is to directly implement your yolov8 wrapper using Model from AutoBackend. Or directly make the class that supports the pt, engine or onnx.
source: https://www.fastcompression.com/blog/jetson-zero-copy.htm
Fortunately, with Jetson Thor this will no longer be necessary, you will have real unified memory as in mac arm or grace hopper sharing even page tables.
Exists some libraries like that that use zerocopy etc you use cudaImage format that shared between CPU and GPU. But if you don't know how to handle it correctly, it will be even worse.
https://github.com/dusty-nv/jetson-utils

@laugh12321 did you see that?

NVIDIA/TensorRT#3859

@johnnynunez yes, I see. That's awesome!