Is it really not possible to get CPU access to Image data and Video Recording in the same pipeline?

Question

Is it really not possible to get CPU access to Image data and Video Recording in the same pipeline?

Opened this issue a year ago · 1 comments

Hey all!

I have a Camera library that can do preview, photo capture, video capture, and frame processing at the same time. On iOS, this works perfectly. But on Android, it actually seems to be impossible to do this with Camera2/android.media APIs.

This is my structure:

graph TD;
Camera-->Preview
Camera-->Photo
Camera-->VideoPipeline
VideoPipeline-->MLKit[MLKit Image Processing]
VideoPipeline-->REC[MediaRecorder/MediaCodec]

Important detail: The VideoPipeline would do Frame Processing/ImageAnalysis and Video Recording in one, aka synchronous.

MLKit Image Processing requires either YUV_420_888, PRIVATE or RGBA_8888 buffers, so this pipeline should work for all 3 pixel formats.
MediaRecorder/MediaCodec records the Frame to a h264/h265 video file (.mp4).

It seems like this is not possible with Camera2 at all, right?

Few potential solutions/ideas I had:

1. Separate outputs

Use separate Camera outputs, MediaRecorder and ImageReader - Does not work because the Camera only allows 3 outputs. We already have 3 (preview, photo, video).

2. Use `ImageReader`/`ImageWriter` to pass it through

graph TD;
  Camera-->ImageReader["VideoPipeline (ImageReader)"]
  ImageReader-->MLKit[MLKit Image Processing]
  ImageReader-->ImageWriter
  ImageWriter-->REC[MediaRecorder/MediaCodec]

This is the closest solution I had so far, and it seems like ImageReader/ImageWriter are really efficient as they are just moving buffers around. But there are multiple problems with this approach:

It does not work on every device. It is not guaranteed that MediaRecorder/MediaCodec can be fed with Images from an ImageWriter, so sometimes it just silently crashes 🤦‍♂️

It seems like it requires the GPU flags to be set - API 29 only - but even those don't really work most of the time:

val flags = HardwareBuffer.USAGE_VIDEO_ENCODE or HardwareBuffer.USAGE_GPU_SAMPLED_IMAGE
val readerFormat = ImageFormat.YUV_420_888 // (or PRIVATE or RGBA_8888)
imageReader = ImageReader.newInstance(width, height, readerFormat, MAX_IMAGES, flags) // <-- API 29+
// ...
val mediaRecorder = ...
mediaRecorder.prepare()
val writerFormat = readerFormat // or does this now need to be ImageFormat.PRIVATE???
imageWriter = ImageWriter.newInstance(mediaRecorder.surface, MAX_IMAGES, writerFormat) // <-- API 29+

imageReader.setOnImageAvailableListener({ reader ->
  val image = reader.acquireNextImage()
  imageWriter.queueInputImage(image)
}, handler)

As far as I understand, it requires an additional conversion step from my format to whatever format the MediaRecorder/MediaCodec wants. So I might need an additional ImageReader that has the PRIVATE format:
```
graph LR;

R1["ImageReader (YUV)"]-->W1["ImageWriter (YUV)"]-->R2["ImageReader (PRIVATE)"]-->W2["ImageWriter (PRIVATE)"]-->REC["MediaRecorder/MediaCodec"]
R1-->MLKit[MLKit Image Processing]
```
Loading
...which is just ridiculous.
It does not support Camera flipping (back <-> front) while recording, because the width/height of the Image Buffers might change and there is no scaling/resizing step in this pipeline.

3. Create a custom OpenGL Pipeline

Create a custom OpenGL pipeline that the Camera will render to, then we do a pass-through render pass to render the Frame to all the outputs:

graph TD;
Camera-->OpenGL["VideoPipeline (OpenGL)"]
OpenGL-->Pass[Pass-Through Shader]
Pass-->ImageReader-->MLKit[MLKit Image Processing]
Pass-->REC[MediaRecorder/MediaCodec]

But, this has four major drawbacks:

It's really really complex to build (I already built it, see this PR, so not a real problem tbh)
It seems like this is not as efficient as a ImageReader/ImageWriter approach, as we do an implicit RGB conversion and an actual render pass, whereas ImageReader/ImageWriter just moving Image Buffers around (at least as far as I understood this)
It only works in RGBA_8888, as OpenGL works in RGB. This means, our frame processor (MLKit) does not work if it is trained on YUV_420_888 data - this is a hard requirement.
It is not synchronous, the ImageReader gets called at a later point. We could not really use information from the Frame to decide what gets rendered later (e.g. to apply a face filter).

At this point I'm pretty clueless tbh. Is a synchronous video pipeline simply not possible at all in Android? I'd appreciate any pointers/help here, maybe I'm not aware of some great APIs.

I only need CPU/GPU buffer access to the native Frame, but I need the format to be configurable (YUV, PRIVATE, RGB), and I need it to be synchronous - e.g. block before Frame is written to MediaRecorder.

Also happy to pay $150/h for consultancy sessions if anyone knows more.

Answer 1 · 2024-03-01T10:09:50.000Z

使用OES纹理创建Surface去configsession，可以自动将YUV_420_888 转成RGBA

1. Separate outputs

2. Use ImageReader/ImageWriter to pass it through

3. Create a custom OpenGL Pipeline

2. Use `ImageReader`/`ImageWriter` to pass it through