pytorch/audio

StreamRead failing when Reading RTSP stream with CPU

pedromoraesh opened this issue ยท 7 comments

๐Ÿ› Describe the bug

image

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

import os
import time

import matplotlib.pyplot as plt
from torchaudio.io import StreamReader
from torchaudio.utils import ffmpeg_utils
import torchvision

#set environment variable for ffmpeg
os.environ["TORIO_USE_FFMPEG_VERSION"] = "4"


print("FFmpeg Library versions:")
for k, ver in ffmpeg_utils.get_versions().items():
    print(f"  {k}:\t{'.'.join(str(v) for v in ver)}")

    print("Available NVDEC Decoders:")
for k in ffmpeg_utils.get_video_decoders().keys():
    if "cuvid" in k:
        print(f" - {k}")

print("Avaialbe GPU:")
print(torch.cuda.get_device_properties(0))


src = "<SOME_RTSP_URL>"
s = StreamReader(src)
s.add_video_stream(5, decoder="hevc")
s.fill_buffer()
(video,) = s.pop_chunks()

print(video.shape, video.dtype, video.device)

Before reaching this issue, i've been using ffmpeg 6 and getting error with threads and gpu parameter on decoder_option.

Imagem do WhatsApp de 2024-05-20 ร (s) 13 57 39_12fc5ec3

import torch
import torchaudio

print(torch.__version__)
print(torchaudio.__version__)

import os
import time

import matplotlib.pyplot as plt
from torchaudio.io import StreamReader
from torchaudio.utils import ffmpeg_utils
import torchvision

#set environment variable for ffmpeg
os.environ["TORIO_USE_FFMPEG_VERSION"] = "6"


print("FFmpeg Library versions:")
for k, ver in ffmpeg_utils.get_versions().items():
    print(f"  {k}:\t{'.'.join(str(v) for v in ver)}")

    print("Available NVDEC Decoders:")
for k in ffmpeg_utils.get_video_decoders().keys():
    if "cuvid" in k:
        print(f" - {k}")

print("Avaialbe GPU:")
print(torch.cuda.get_device_properties(0))


src = "<SOME_RTSP_URL>"
s = StreamReader(src)
s.add_video_stream(5, decoder="hevc_cuvid", hw_accel="cuda:0", decoder_option={"gpu": "0"})
s.fill_buffer()
(video,) = s.pop_chunks()

print(video.shape, video.dtype, video.device)

Versions

Collecting environment information...
PyTorch version: 2.3.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4070 Ti
Nvidia driver version: 552.22
cuDNN version: Probably one of the following:
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.9.7
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.7
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.7
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.7
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.7
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.7
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
Model name: 13th Gen Intel(R) Core(TM) i5-13600K
CPU family: 6
Model: 183
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 1
Stepping: 1
BogoMIPS: 6988.79
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization: VT-x
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 480 KiB (10 instances)
L1i cache: 320 KiB (10 instances)
L2 cache: 20 MiB (10 instances)
L3 cache: 24 MiB (1 instance)
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] optree==0.11.0
[pip3] torch==2.3.0+cu118
[pip3] torchaudio==2.3.0+cu118
[pip3] torchreid==0.2.5
[pip3] torchvision==0.18.0+cu118
[pip3] triton==2.3.0
[conda] Could not collect

I have the same problem, have you solved it?

I have the same problem, have you solved it?

I did test with FFMPEG 4 and got the first error, with ffmpeg 6 got the second about threads and gpu.. Until now couldn't solve. Tried to process using FFMPEG from source, but it's not optimized, tried PyAV, but doesn't support GPU, tried tensor-stream but getting segmentation fault. No options for now, torch audio only supports RTMP on my tests and also torchvision use PyAV which cant even process RTMP.

I tried setting up a streaming service using mediamtx and pushing the stream with ffmpeg, and it was successfully decoded with StreamReader. However, when I switched to using a real camera device's RTSP stream today, I encountered the same issue as you.๐Ÿ˜‚๐Ÿ˜‚

I'm deployed my own RTMP server, for H264 works fine, but RTSP i tried every single thing and seems that torch wont fix it soon... I'm trying to change codecs, parameters and that kind of stuff, but no results at moment.

I found through the source code that it does not seem to support the yuvj420p format, and may need to modify the source code to enable support: https://github.com/pytorch/audio/blob/main/src/libtorio/ffmpeg/stream_reader/conversion.cpp

Another method is to use ffmpeg to fetch the stream from the remote device, decode it, and then re-encode it into a format supported by StreamReader. After that, the locally streamed data can be provided to StreamReader. However, this method requires copying, which will consume more cpu resources.

ffmpeg  -i "rtsp://example/" -c:v h264_nvenc -pix_fmt yuv420p -f rtsp rtsp:/0.0.0.0:8554/stream

I found through the source code that it does not seem to support the yuvj420p format, and may need to modify the source code to enable support: https://github.com/pytorch/audio/blob/main/src/libtorio/ffmpeg/stream_reader/conversion.cpp

This can help me to find a solution, yesterday i tried dev release, but still no chance.
I thinking about something, stream reader acecpts some filter options that are passed on to ffmpeg. Maybe one of those filters can convert image without restreaming

Another method is to use ffmpeg to fetch the stream from the remote device, decode it, and then re-encode it into a format supported by StreamReader. After that, the locally streamed data can be provided to StreamReader. However, this method requires copying, which will consume more cpu resources.

ffmpeg  -i "rtsp://example/" -c:v h264_nvenc -pix_fmt yuv420p -f rtsp rtsp:/0.0.0.0:8554/stream

Agree, this could be CPU intensive and kill the optimization from StreamReader

I found through the source code that it does not seem to support the yuvj420p format, and may need to modify the source code to enable support: https://github.com/pytorch/audio/blob/main/src/libtorio/ffmpeg/stream_reader/conversion.cpp

Another method is to use ffmpeg to fetch the stream from the remote device, decode it, and then re-encode it into a format supported by StreamReader. After that, the locally streamed data can be provided to StreamReader. However, this method requires copying, which will consume more cpu resources.

ffmpeg  -i "rtsp://example/" -c:v h264_nvenc -pix_fmt yuv420p -f rtsp rtsp:/0.0.0.0:8554/stream

I managed to make it work using rtsp_transport option on StreamReader @tunmx , after 2 months of work :)

When consuming rtsps you should pass "rtsp_transport: tcp" in option args alongside with specified codec.