aaclause/nvda-OpenAI

capturing images from camera or something that appears to Windows as a camera?

Closed this issue · 3 comments

Hello,

Let me start by once again thanking everyone who has contributed to the add-on, it has made my life a good deal better and I am very grateful indeed. I would just like to suggest a feature which I think might be useful, if I may. It requires a workaround at the moment. I can, using an HDMI capture card and the video sourcing methods to display it in the browser as, for example

https://superuser.com/questions/1744688/how-can-i-view-the-video-coming-in-from-a-capture-card-on-windows-in-full-screen

get pictures from the capture card and other cameras to chat GPT. This allows, just for example, the installation of inaccessible systems, working with UEFI, and so on. This is not just OCR, I may say, you can ask the LLM what has keyboard focus, how to get to the selection you want, etc. You can't do that with simple OCR. Would it be possible for the add-on to have a keystroke to pull in an image not just from a screenshot and navigator object, but from a webcam or capture card which looks like a webcam? This would enable people to avoid having to play with the image in the browser to remove the video controls, the running time, etc. All I want to send to the model is the image from the camera, the other stuff in the browser window, and even in the navigator object, is not needed. Thanks for having a look and, again, for the entire project.

Thanks for your great suggestion!

I've just discovered this project/library: https://github.com/bunkahle/pygrabber

I'll test it, and if it works well, I'll include the feature. :)

Hello @a-singer,
Sorry for the lack of updates.
I plan to continue this add-on in a dedicated app, see #69 (comment). I'm saving this feature for it.