cretz/go-scrap

example/video/main.go hangs after 3 frames on macOS only

Opened this issue · 7 comments

efunn commented

Hi there,

I'm interested in using this library to record multiple frames (e.g. video from screen). I first tried running the video example on Ubuntu (18.04) and it works correctly, capturing frames continuously until I press enter. This can easily be seen while it's running, since after each frame there is a call to print("."), and you can see .............. continuously in the console as each frame comes in.

However, on macOS (10.13 on my current machine), it captures 3 frames and then stops (i.e. it prints ... and then stops). It appears to never get a frame after that, and constantly returns a wouldBlock from the capturer. The ffplay example from the rust scrap library does not encounter such a problem on OSX, so it seems that this is an issue in the golang implementation, though I'm not really sure where it's going wrong.

Note: I had to manually add the changes from this pull request to get the library working at all on macOS. I also had a couple friends test out the video example on macOS machines and confirmed the same thing (3 frames captured and then nothing).

I understand this library hasn't been touched in a while, so I'm happy to dig through the code if you point me in the right direction. Thank you!

cretz commented

Yeah, the code here is quite small, so feel free to debug as needed to see where it's bloicking. Maybe something on the Rust side which is where the would block is returned. In many cases you might be able to ignore would block errors, I am unsure (I don't use mac).

I did go ahead and merge that PR to make mac builds work

efunn commented

Okay, I've made some progress on understanding this.

The quartz (macOS) implementation of screen capture is significantly different from the others. It uses a CGDisplayStream to grab frames. A parameter passed to this during initialization gives it a queue length (number of frames to queue). This is configurable from 3 to 8 frames, but defaults to 3 (and in the rust scrap implementation, it is actually set to 3). I confirmed that changing this variable to 8 (forking the repo, and modifying the go-scrap/scrap-sys/Cargo.toml file to have scrap = { git = "url/of/forked/modified/repo" }), results in the video demo hanging after 8 frames instead of 3.

When we start the CGDisplayStream, a CGDisplayStreamFrameAvailableHandler starts running constantly. When a frame becomes available, the frame gets transferred to a surface and then gets passed off, locks the frame, and runs IOSurfaceIncrementUseCount(). After it returns the frame, it drops the frame, unlocks the frame, and runs IOSurfaceDecrementUseCount().

This actually works fine in the background, when we haven't started trying to call capturer_frame from go-scrap. So the queue isn't filling up just by the capturer running. However, when we call frame HERE, which is what go-scrap actually hits, we end up running IOSurfaceIncrementUseCount(), BUT IOSurfaceDecrementUseCount() never gets called. So it looks like we end up running up the buffer by one frame every time we try to grab a frame from go-scrap.

So, perhaps in capturer_frame we are throwing away the frame from memory and it never gets dropped? Changing this to std::mem::drop(frame) breaks things, but I feel like there should be a way to handle it here.

cretz commented

Good catch. There is most definitely a drop implementation in the underlying frame data at https://github.com/quadrupleslap/scrap/blob/de3cd4cd8610f1cac7f80e29f59dbdb93ffefc6d/src/quartz/frame.rs#L37. What should probably happen is I should copy over the u8 slice (e.g. copy_from_slice) in Rust and return that and then let the RAII of the frame call drop when it leaves scope like it normally would instead of mem::forget'ing it. This technically reduces performance, but is safer. There's probably a way to just mem forget the slice itself or something instead of copying but still let frame's drop get called. I don't really work on this library much anymore, but yeah that'd be the first step is to see if you can get drop to be called while returning the data as cheaply as possible. I'd gladly merge a PR to this effect.

efunn commented

I got it to work! No guarantees on performance though. Ended up copying the frame over as you suggested. I agree it should be possible to just forget the slice, just haven't dug into it far enough to find out.

Here's what I ended up doing:
efunn@e59765e

This will definitely slow down all implementations, so maybe not ready for a pull request yet, but if you have any suggestions to clean it up, please let me know!

cretz commented

Awesome. One thing I was thinking is giving back the frame pointer along with the byte slice, storing it on the Go side, and dropping it before grabbing again each frame call. Then add a Close call to the capturer to make sure that last frame is collected. Or really, just keep a capturer on the rust side updating the frame over and over and let it have a call to get the bye array. But meh, lots of solutions.

efunn commented

So actually, in the Rust implementation of scrap on MacOS, the capturer is updating the frame over and over constantly. When you do your frame call, it think it just tries to lock the frame and then grab it from that constantly updating capturer. So it might already be doing what you envision.

Or are you suggesting you add another layer to avoid interacting with the Quartz CGDisplayStream directly from golang? Like, always grabbing the frame into memory somewhere in your lib.rs rust code regardless of whether you actually need it, and then simplifying your frame call to just swap the latest frame over to the golang side? This would avoid having the golang side directly grabbing from the Rust scrap library (and the underlying CGDisplayStream objects). My concern with that is how it will affect performance across platforms - since it's only really broken on MacOS currently, so this complexity would only be a benefit for one platform.

I think your first suggestion of just more intelligently grabbing the frame pointer/byte slice (rather than making a copy in memory) would be simpler... I feel like it should just be a couple lines of code to change. If we can ensure it isn't doing any memory copying then performance shouldn't be affected on other platforms. I would still do some timing tests to double-check, though.

cretz commented

Or are you suggesting you add another layer to avoid interacting with the Quartz CGDisplayStream directly from golang?

I suppose. Basically you just have to make sure the frame is properly dropped (which calls the proper macOS code) at the right time from Rust which happens when there are no more references to it. This is why when you copied that slice, it was able to be dropped which called the right code. There are many ways to do this. One way is to return the opaque frame mem::forget'ed reference/pointer back to Go, then mem::drop'ing it before next frame or on close.