capturing images from camera or something that appears to Windows as a camera?

Question

capturing images from camera or something that appears to Windows as a camera?

Closed this issue 6 months ago · 3 comments

Hello,

Let me start by once again thanking everyone who has contributed to the add-on, it has made my life a good deal better and I am very grateful indeed. I would just like to suggest a feature which I think might be useful, if I may. It requires a workaround at the moment. I can, using an HDMI capture card and the video sourcing methods to display it in the browser as, for example

https://superuser.com/questions/1744688/how-can-i-view-the-video-coming-in-from-a-capture-card-on-windows-in-full-screen

get pictures from the capture card and other cameras to chat GPT. This allows, just for example, the installation of inaccessible systems, working with UEFI, and so on. This is not just OCR, I may say, you can ask the LLM what has keyboard focus, how to get to the selection you want, etc. You can't do that with simple OCR. Would it be possible for the add-on to have a keystroke to pull in an image not just from a screenshot and navigator object, but from a webcam or capture card which looks like a webcam? This would enable people to avoid having to play with the image in the browser to remove the video controls, the running time, etc. All I want to send to the model is the image from the camera, the other stuff in the browser window, and even in the navigator object, is not needed. Thanks for having a look and, again, for the entire project.

Answer 1 · 2024-02-04T11:28:19.000Z

Thanks for your great suggestion!

I've just discovered this project/library: https://github.com/bunkahle/pygrabber

I'll test it, and if it works well, I'll include the feature. :)

Answer 2 · 2024-02-04T13:10:03.000Z

I can't thank you enough for looking into this. Just to update the issue, I plugged in a logitech webcam and used the same method, nvda+o on the navigator object in the browser window, to capture a picture from it. As I expected, the model was able to describe the image and answer questions. I think this means that this will work, if it can be managed, not just for getting data from capture cards but for images from the world as seen by all other cameras. Thanks again for looking into it.

…

On 2/4/2024 6:28 AM, André-Abush Clause wrote: Thanks for your great suggestion! I've just discovered this project/library: https://github.com/bunkahle/pygrabber I'll test it, and if it works well, I'll include the feature. :) — Reply to this email directly, view it on GitHub <#56 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIC3JCYA7ADBHT72PLSFZSDYR5WF7AVCNFSM6AAAAABCYHC5MWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVG4YTCNJSGM>. You are receiving this because you authored the thread.Message ID: ***@***.***>

--------------DACCE3820D15400839B83ABC Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit <html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body bgcolor="#FFFFFF" text="#000000"> I can't thank you enough for looking into this. Just to update the issue, I plugged in a logitech webcam and used the same method, nvda+o on the navigator object in the browser window, to capture a picture from it. As I expected, the model was able to describe the image and answer questions. I think this means that this will work, if it can be managed, not just for getting data from capture cards but for images from the world as seen by all other cameras. Thanks again for looking into it.  <div class="moz-cite-prefix">On 2/4/2024 6:28 AM, André-Abush Clause wrote: </div> <blockquote ***@***.***" type="cite"> Thanks for your great suggestion! I've just discovered this project/library: <a moz-do-not-send="true" href="https://github.com/bunkahle/pygrabber">https://github.com/bunkahle/pygrabber</a> I'll test it, and if it works well, I'll include the feature. :) — Reply to this email directly, <a moz-do-not-send="true" href="#56 (comment)">view it on GitHub</a>, or <a moz-do-not-send="true" href="https://github.com/notifications/unsubscribe-auth/AIC3JCYA7ADBHT72PLSFZSDYR5WF7AVCNFSM6AAAAABCYHC5MWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVG4YTCNJSGM">unsubscribe</a>. You are receiving this because you authored the thread.<img moz-do-not-send="true" src="https://github.com/notifications/beacon/AIC3JC5JO4DU5YNAV3VABJDYR5WGBA5CNFSM6AAAAABCYHC5MWWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTSZADKG.gif" alt="" height="1" width="1">Message ID: <aaclause/nvda-OpenAI/issues/56/1925711523@github.com> <script type="application/ld+json">[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#56 (comment)", "url": "#56 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script> </blockquote> </body> </html>

--------------DACCE3820D15400839B83ABC--

Answer 3 · 2024-03-22T04:41:06.000Z

Hello @a-singer,
Sorry for the lack of updates.
I plan to continue this add-on in a dedicated app, see #69 (comment). I'm saving this feature for it.