CircleRadon/Osprey

API Help

arthurwolf opened this issue · 2 comments

Hello.

Is there some documentation for the API I'm missing ?

When I click on the "Use with API" at the bottom of the demo, I get some documentation, but it's not very clear, I just get that there are some numbered functions and what their parameters are, but I'm not clear on what the functions actually are/do.

Any help would be extremely welcome.

I have another question. What I'm trying to do, is ask a model to find all faces in an image (and their position) and/or find all speech bubbles in an image (and their position) etc.

I currently do this with segment-anything and gpt4-v but it's extremely expensive, I'd really like to be able to run it locally.

Screenshot from 2024-01-10 00-56-35
Screenshot from 2024-01-10 00-56-25

You can see the technique I do now in the pictures: segment, group zones by non-overlap, and for each group, label each zone with a number, and ask gpt4-v to tell me which number is a speech bubble, which a face, which a sound effect, etc. This is pretty accurate (about 5% error rate, and going down as I improve the prompt/labelling, though most improvements I found also come at the cost of spending more tokens)

Is there a way to get the same result with Osprey ?

Thank you so much in advance.

@arthurwolf For alternaltive way with Osprey, I suggest a practical approach. Start by using SAM to segment everything. Then, for each mask, ask Osprey, is it a speech bubble? Is it a face? However, the effectiveness of this method may not be optimal.

Thanks for the feedback, that's what I'll try out when I get the api/demo working, to see how well it works or not. and it if works well enough I'll work on a local deploy.