
An interactive aid for blind people using microsoft/visual-chatgpt

(Originally motivated as a potential Deep Hack submission)

The ultimate aim is to have a WiFi enabled camera mounted on a glass for visually impaired people, to give them 
an idea what they could possibly see in their visual-field, and to provide an interactive interface on top 
of it so that they could ask questions about what they’re being told lies in front of them.



Quick Start

(akin to setting up microsoft/visual-chatgpt)

# clone the repo
git clone git@github.com:justanotherlad/blindvisaidgpt.git

# Go to directory
cd blindvisaidgpt

# create a new environment
conda create -n visgpt python=3.8

# activate the new environment
conda activate visgpt

#  prepare the basic environments
pip install -r requirements.txt

# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}

# prepare your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}

# Start blindvisaidgpt !
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which 
# Visual Foundation Model to use and where it will be loaded to
# The model and device are separated by underline '_', the different models are separated by comma ','
# The available Visual Foundation Models can be found in the following table
# For example, if you want to load ImageCaptioning to cpu and VisualQuestionAnswering to cuda:0
# You can use: "ImageCaptioning_cpu,VisualQuestionAnswering_cuda:0"

# Advice for CPU Users (current default)
python visual_chatgpt.py --load ImageCaptioning_cpu,VisualQuestionAnswering_cpu

# Advice for 1 Tesla T4 15GB  (Google Colab)  --(feature yet to be tried)                       
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,VisualQuestionAnswering_cuda:0"
# Advice for 4 Tesla V100 32GB  --(feature yet to be tried)                        
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,ImageEditing_cuda:0,