Add vision support
ErikBjare opened this issue · 0 comments
ErikBjare commented
Since the OpenAI API now has vision in beta, and we could use LLaVa locally.
Might be a lot of work, or might be super easy.
Question is, what would it be useful for?
- #51: Xvfb to understand display/output and make a E2E desktop agent
- #52: Screenshot with browser tool
- Can be used to take screenshots of developed webapps for visually-aided autodebugging
- Have it review plot outputs for correctness and to inspect results
- Could be useful for data science, but reading a good plain text output might still be superior