cnuernber/avclj

Expose more frame info

Opened this issue · 6 comments

I wrote some demos that display videos using skia, swing, and cljfx.

The cljfx and swing examples worked great, but for skia, I needed access to the raw frame data pointer for both reading and writing. Using the pointer directly allows skia to read and write its data directly. Additionally, I wanted to use the frame's best_effort_timestamp. I think that given enough use cases, exposing all of the data in AVFrame might be useful, but for now, exposing :data and :best-effort-timestamp would help when working with other c libraries that can work with pointers.

Currently, my fork just returns the raw frame for decoding. For encoding, I added get-input-frame to the decoder interface that calls avcodec/av_frame_make_writable and returns the input frame. I'm not sure that's the best approach.

You can see my changes in the raw-frame branch of my fork. This is the main commit, phronmophobic@fb8e042.

Do you think exposing more info from AVFrame makes sense? Do you have any thoughts on what that might look like? I could submit a pull request.

It makes total sense. The tricky thing about frames is the layout of the data in that sometimes it is a planar format which means several data buffers as opposed to one and in the case of YUV420 the buffers have different widths/heights. Exposing that is definitely a caveat-emptor type pathway but honestly it makes complete sense. I like it when libraries allow the higher level users to get as close to the edge as possible - initially users can stick to the examples and such and not use the more raw knives-out interfaces.

Great! Skia supports many different pixfmts directly. I think skia even supports YUV, but I haven't figured out the exact incantation yet.

The thing is the frame data information is an array of pointers. While I get that skia may support YUV, I doubt it will have the same layout as that. So for example the default video encode/decode format for h264 is YUV420P which is stored as 3 separate pointers, each pointing to a plane of channel data. That is why I didn't expose those members directly and also why the frame data is exposed to users as a sequence of tensors where each tensor may actually have a different shape as is the case with the aforementioned format. Also - you can get a pointer from those tensors as long as they haven't been cloned.

avclj-test> (def decoder (avclj/make-video-decoder "test/data/test-video.mp4"))
Oct 20, 2021 7:12:52 AM clojure.tools.logging$eval8272$fn__8275 invoke
INFO: Reference thread starting
#'avclj-test/decoder
avclj-test> (def frame (avclj/decode-frame! decoder))
#'avclj-test/frame
avclj-test> (type frame)
clojure.lang.PersistentVector
avclj-test> (def tens (first frame))
#'avclj-test/tens
avclj-test> (require '[tech.v3.datatype :as dt])
nil
avclj-test> (def nbuf (dt/as-native-buffer tens))
#'avclj-test/nbuf
avclj-test> (type nbuf)
tech.v3.datatype.native_buffer.NativeBuffer
avclj-test> (.address nbuf)
139933330343424

Oh ok. I did not realize that the pointer was accessible from the tensors. That's a good start. I noticed that encode-frame! also handles the frame's :pts property which might be useful to set manually in some cases.

I've been playing with avfilter bindings. Filter graphs can translate between different pixfmts, make your gifs look 100x better, and there are many other filters that I haven't tried yet.

Filter graphs are fairly easy to use and it shouldn't be that bad to create a clojure friendly interface.

I still don't have any good suggestions for what an API might look like ¯_(ツ)_/¯

Filter graphs are on the path to making this stuff a lot more concretely useful.

There is a python library that does a good job of making those filter graphs accessible - https://github.com/kkroening/ffmpeg-python.

That looks great!