Ever tried to select all boxes that have traffic lights showing up and confused if you're supposed to check the very edge of it?
The inspiration comes from providing a better Captcha user experience while keeping security also in mind.
It shows a user an AI generated image and ask to select keywords/tags that are associated with the image. Tags will have 3 correct tags/keywords (that are actually associated with the image and part of the prompt) Tags will also have 3 negative tags/keywords (that are randomly generated and have nothing to do with the image)
So here we use the fact that the human is supposed to decipher what sort of tags is associated with the image directly or indirectly. The image and keywords are associated in a more complex way than just literally asking to classify the image.
The product can be divided into 3 components described below:
- AI component: We use diffusion models generated images from prompts | Convert prompts into tags using POS model and some preprocessing steps
- Backend component: We randomly pick Image and Prompt from an API | We convert prompt into tags from another API
- Frontend component: We display a Captcha like experience but with a modern touch to it using
- Challenges were in creating a NLP pipeline, so we can select truly relevant keywords from the prompt (which is used to generate the image).
- The Negative keywords were being selected from a random word generator we created. Even with carefully picking correct keywords from prompts and picking random words as negative keywords, we can sometimes look at the image and be confused as to what correct tags are. This needs further improvement.
- Make the Tag generation model better and more intuitive for making it easier.
- Protect our service from scraping and attacks.
- To add user specific Captcha, called personalized Captcha security. A user's interests are used to show related images and asked to clear the task of selecting correct tags.