Slow inference - Run Whisper API without extra encoding/downloading file, and use bytes directly.

Question

Slow inference - Run Whisper API without extra encoding/downloading file, and use bytes directly.

Opened this issue 2 years ago · 0 comments

Hi,

The GPU performance is similar to my CPU for small-medium videos due to extra io processing/encoding.

For my application, I use FastAPI for majority of my core functionality. However, I require a GPU to transcribe video/audio files to retrieve transcriptions, and decided to use Bananaml as serverless GPU seems like it would be much cheaper than Kubernetes.

How can I pass this UploadFile = File(...) object from FastAPI (spooled temporary file) to my bananaml API, instead of sending an encoded byte string from reading a mp3/mp4 file that is saved locally.

Old way (Faster on my CPU compared to bananaML)

Upload video on web page -> write file into temporary file -> pass to Whisper

New Way (GPU with BananaML)

Upload video on web page-> Save file locally ->Read bytes from file locally -> decode to json format -> pass to whisper.

I get there has to be an extra io operation to send the video information to the GPU, but the way recommended in the template is highly efficient, I wish I could pass the the file like object as done with FastAPI

Thanks,

Viraj