diCaptcha - Diffusion based Captcha

Inspiration 💡

Ever tried to select all boxes that have traffic lights showing up and confused if you're supposed to check the very edge of it?

The inspiration comes from providing a better Captcha user experience while keeping security also in mind.

What it does 🤖

It shows a user an AI generated image and ask to select keywords/tags that are associated with the image. Tags will have 3 correct tags/keywords (that are actually associated with the image and part of the prompt) Tags will also have 3 negative tags/keywords (that are randomly generated and have nothing to do with the image)

So here we use the fact that the human is supposed to decipher what sort of tags is associated with the image directly or indirectly. The image and keywords are associated in a more complex way than just literally asking to classify the image.

How we built it 🛠️

The product can be divided into 3 components described below:

AI component: We use diffusion models generated images from prompts | Convert prompts into tags using POS model and some preprocessing steps
Backend component: We randomly pick Image and Prompt from an API | We convert prompt into tags from another API
Frontend component: We display a Captcha like experience but with a modern touch to it using

Challenges we ran into

Challenges were in creating a NLP pipeline, so we can select truly relevant keywords from the prompt (which is used to generate the image).
The Negative keywords were being selected from a random word generator we created. Even with carefully picking correct keywords from prompts and picking random words as negative keywords, we can sometimes look at the image and be confused as to what correct tags are. This needs further improvement.

What's next for diCaptcha Diffusion Captcha For Creative Tastes 🔜