Server to client media playback with frame-based processing
jamjambles opened this issue · 3 comments
Many of the examples in this repo show client to server media sinks (mic / video capture), which have frame based callback processing. I am looking to do server to client media playback, with frame based callback processing. This would be useful for real-time audio playback with real-time processing.
After searching through this discussion https://discuss.streamlit.io/t/new-component-streamlit-webrtc-a-new-way-to-deal-with-real-time-media-streams/8669, and the example pages in streamlit-webrtc, I have not been able to find an example of this.
To be specific, I am looking to do the following:
- Load an audio file (server)
- Start playback (from server to client), frame by frame
- Process each frame (before it is sent to the client) via a callback (processing should occur on the server, for example ML inference)
- Playback processed audio frame to client
- Continue in real-time
This example uses the MediaPlayer class from aiortc:
. However it does not seem that this provides any sort of callback on the stream (at the audio frame level).Digging deeper, the MediaPlayer class has a MediaStreamTrack instance (https://aiortc.readthedocs.io/en/latest/api.html#aiortc.MediaStreamTrack) which has a recv
callback method for each frame.
Would the correct approach be to create a new subclass of MediaStreamTrack and write a custom recv
for the required processing? I found this related thread: aiortc/aiortc#571
Is this functionality supported currently? I would appreciate any guidance here.
Thanks heaps!
Interested in pointers to potential solutions. My use case is similar, generate some text-to-speech and play it back.
The example you mentioned (https://github.com/whitphx/streamlit-webrtc/blob/main/pages/8_media_files_streaming.py) uses a callback to process the video frames (video_frame_callback
).
Does using audio_frame_callback
in this place instead work for you?
The audio filter example may also be a reference about the usage of audio callback while it's a client-to-server example.
Interested in pointers to potential solutions. My use case is similar, generate some text-to-speech and play it back.
@wenshutang I'm also interested in this issue for my text-to-speech streaming problem. I wonder if you've found a solution yet. If so, could you please share some references or suggestions? Thank you.