An Android application which converts camera feed to natural language captions in real time. The app uses our customized pre-trained model generated through image-caption-generator. Using this model the app takes 1-2 second(s) to caption a live camera frame on Huawei Honor 6x.
The trained model to run this app can be obtained here.
- Android-Sdk for > Kitkat
- Android-Studio
- Tensorflow Java Library
- Already provided build #44 in this repository. Latest nightly builds can be obtained frome here
- Warning: Did not test this app with builds other that #44
- Trained model from image-caption-generator
- Word IDs to Word map pickle from image-caption-generator currently provided in
Application/src/main/assets
To build this app for your android phone-
- Clone this repository
- Download the trained model from here.
- Add the downloaded pre-trained model to
Application/src/main/assets
folder in the repository. - Open the repository in Android Studio
- Build the app on your device using Android Studio
The app is just a prototype, which uses our optimized and skimmed-down model from image-caption-generator, we also use a faster encoder CNN- Google's Inception v4.and finally use an end-to-end pre-trained model as balackbox in this app for quickly generating captions in real time.
Note: Due to lack of computation power our model is not very well trained.
Here is a quick preview of the app which was made by pointing the device camera towards a slideshow running on a screen and some real-life scenes. #TO-DO: Create a real preview by testing the app on streets.
- To create a tensorflow android app from scratch please follow this brilliant tutorial by Omid Alemi.
- Currently the app is tested for Huawei Honor 6x only.
If you use our model or code in your research, please cite the paper:
@article{Mathur2017,
title={Camera2Caption: A Real-time Image Caption Generator},
author={Pranay Mathur and Aman Gill and Aayush Yadav and Anurag Mishra and Nand Kumar Bansode},
journal={IEEE Conference Publication},
year={2017}
}