API Help
arthurwolf opened this issue · 2 comments
Hello.
Is there some documentation for the API I'm missing ?
When I click on the "Use with API" at the bottom of the demo, I get some documentation, but it's not very clear, I just get that there are some numbered functions and what their parameters are, but I'm not clear on what the functions actually are/do.
Any help would be extremely welcome.
I have another question. What I'm trying to do, is ask a model to find all faces in an image (and their position) and/or find all speech bubbles in an image (and their position) etc.
I currently do this with segment-anything and gpt4-v but it's extremely expensive, I'd really like to be able to run it locally.
You can see the technique I do now in the pictures: segment, group zones by non-overlap, and for each group, label each zone with a number, and ask gpt4-v to tell me which number is a speech bubble, which a face, which a sound effect, etc. This is pretty accurate (about 5% error rate, and going down as I improve the prompt/labelling, though most improvements I found also come at the cost of spending more tokens)
Is there a way to get the same result with Osprey ?
Thank you so much in advance.
@arthurwolf For alternaltive way with Osprey, I suggest a practical approach. Start by using SAM to segment everything. Then, for each mask, ask Osprey, is it a speech bubble? Is it a face? However, the effectiveness of this method may not be optimal.
Thanks for the feedback, that's what I'll try out when I get the api/demo working, to see how well it works or not. and it if works well enough I'll work on a local deploy.