Is it really not possible to get CPU access to Image data and Video Recording in the same pipeline?
Opened this issue · 1 comments
Hey all!
I have a Camera library that can do preview, photo capture, video capture, and frame processing at the same time. On iOS, this works perfectly. But on Android, it actually seems to be impossible to do this with Camera2/android.media APIs.
This is my structure:
graph TD;
Camera-->Preview
Camera-->Photo
Camera-->VideoPipeline
VideoPipeline-->MLKit[MLKit Image Processing]
VideoPipeline-->REC[MediaRecorder/MediaCodec]
Important detail: The VideoPipeline
would do Frame Processing/ImageAnalysis and Video Recording in one, aka synchronous.
- MLKit Image Processing requires either YUV_420_888, PRIVATE or RGBA_8888 buffers, so this pipeline should work for all 3 pixel formats.
- MediaRecorder/MediaCodec records the Frame to a h264/h265 video file (.mp4).
It seems like this is not possible with Camera2 at all, right?
Few potential solutions/ideas I had:
1. Separate outputs
Use separate Camera outputs, MediaRecorder
and ImageReader
- Does not work because the Camera only allows 3 outputs. We already have 3 (preview, photo, video).
2. Use ImageReader
/ImageWriter
to pass it through
graph TD;
Camera-->ImageReader["VideoPipeline (ImageReader)"]
ImageReader-->MLKit[MLKit Image Processing]
ImageReader-->ImageWriter
ImageWriter-->REC[MediaRecorder/MediaCodec]
This is the closest solution I had so far, and it seems like ImageReader
/ImageWriter
are really efficient as they are just moving buffers around. But there are multiple problems with this approach:
-
It does not work on every device. It is not guaranteed that
MediaRecorder
/MediaCodec
can be fed with Images from anImageWriter
, so sometimes it just silently crashes 🤦♂️ -
It seems like it requires the GPU flags to be set - API 29 only - but even those don't really work most of the time:
val flags = HardwareBuffer.USAGE_VIDEO_ENCODE or HardwareBuffer.USAGE_GPU_SAMPLED_IMAGE val readerFormat = ImageFormat.YUV_420_888 // (or PRIVATE or RGBA_8888) imageReader = ImageReader.newInstance(width, height, readerFormat, MAX_IMAGES, flags) // <-- API 29+ // ... val mediaRecorder = ... mediaRecorder.prepare() val writerFormat = readerFormat // or does this now need to be ImageFormat.PRIVATE??? imageWriter = ImageWriter.newInstance(mediaRecorder.surface, MAX_IMAGES, writerFormat) // <-- API 29+ imageReader.setOnImageAvailableListener({ reader -> val image = reader.acquireNextImage() imageWriter.queueInputImage(image) }, handler)
-
As far as I understand, it requires an additional conversion step from my format to whatever format the
MediaRecorder
/MediaCodec
wants. So I might need an additionalImageReader
that has the PRIVATE format:graph LR; R1["ImageReader (YUV)"]-->W1["ImageWriter (YUV)"]-->R2["ImageReader (PRIVATE)"]-->W2["ImageWriter (PRIVATE)"]-->REC["MediaRecorder/MediaCodec"] R1-->MLKit[MLKit Image Processing]
...which is just ridiculous.
-
It does not support Camera flipping (back <-> front) while recording, because the
width
/height
of the Image Buffers might change and there is no scaling/resizing step in this pipeline.
3. Create a custom OpenGL Pipeline
Create a custom OpenGL pipeline that the Camera will render to, then we do a pass-through render pass to render the Frame to all the outputs:
graph TD;
Camera-->OpenGL["VideoPipeline (OpenGL)"]
OpenGL-->Pass[Pass-Through Shader]
Pass-->ImageReader-->MLKit[MLKit Image Processing]
Pass-->REC[MediaRecorder/MediaCodec]
But, this has four major drawbacks:
- It's really really complex to build (I already built it, see this PR, so not a real problem tbh)
- It seems like this is not as efficient as a
ImageReader
/ImageWriter
approach, as we do an implicit RGB conversion and an actual render pass, whereasImageReader
/ImageWriter
just moving Image Buffers around (at least as far as I understood this) - It only works in RGBA_8888, as OpenGL works in RGB. This means, our frame processor (MLKit) does not work if it is trained on YUV_420_888 data - this is a hard requirement.
- It is not synchronous, the
ImageReader
gets called at a later point. We could not really use information from the Frame to decide what gets rendered later (e.g. to apply a face filter).
At this point I'm pretty clueless tbh. Is a synchronous video pipeline simply not possible at all in Android? I'd appreciate any pointers/help here, maybe I'm not aware of some great APIs.
I only need CPU/GPU buffer access to the native Frame, but I need the format to be configurable (YUV, PRIVATE, RGB), and I need it to be synchronous - e.g. block before Frame is written to MediaRecorder.
Also happy to pay $150/h for consultancy sessions if anyone knows more.
使用OES纹理创建Surface去configsession,可以自动将YUV_420_888 转成RGBA