Auto Image Caption for Web

A Chrome Extension that uses machine learning to auto caption images and fix missing Alt Texts

DESCRIPTION

Digital accessibility ensures that websites, web apps, and digital content can be used by people with a diverse range of hearing, movement, sight or cognitive abilities. One way to promote digital accessibility is by using alt text (alternative text), which provides a text alternative to non-text content in web pages including images, media, etc. Alt text can be challenging to audit, edit and/or update in existing websites. This Chrome extension will automate this process by using machine learning and image detection. IM2TXT captioning is the model used in this project.

IM2TXT Model

The image encoder is a deep convolutional neural network. This type of network is widely used for image tasks and is currently state-of-the-art for object recognition and detection. Our particular choice of network is the Inception v3 image recognition model pretrained on the ILSVRC-2012-CLS image classification dataset. The decoder is a long short-term memory (LSTM) network. This type of network is commonly used for sequence modeling tasks such as language modeling and machine translation. In the Show and Tell model, the LSTM network is trained as a language model conditioned on the image encoding.

INSPIRATION

Making Amazon Alexa respond to Sign Language using AI
Accessibility Extensions
Abi and I audited ITP websites to check if they were compliant with NYU Accessibility standards our first semester at ITP

REFERENCES

Auto Alt Text - Need to click on the image before alt text appears
Wave Evaluation Tool - Evaluate web accessibility within the Chrome browser
Cloudinary Add-ons - Cloudinary takes care of your entire image management pipeline. With Cloudinary Add-ons, you can enhance your images even further with powerful functionality
ACM SIGACCESS Conference on Computers and Accessibility
Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service - Designed and deployed automatic alt-text (AAT), a system that applies computer vision technology to identify faces, objects, and themes from photos to generate photo alt-text for screen reader users on Facebook
Emojis and screenreaders - “This is social media when you are blind.”

AUDIENCE

People who utilize a screenreader to access alt text
People who need update alt text retroactively to comply with digital accessibility standards

NEXT STEPS

Make it a WP plugin
Generate images based on labels
Use ML to provide a better screen reader experience
Retain model on web semantics

INSTRUCTIONS

Download this repo.
Archive it into a .zip file.
Go to chrome://extensions/ and enable the extension.
Open any webpage.
Run the extension.

by Hayk Mikayelyan, Abi Muñoz.
_{Thank you Yining Shi, Lauren Race, Ellen Nickels for helping us with this project.}

davidbasswwu/Auto-Image-Caption-for-Web-Using-Machine-Learning