Image Captioning with BLIP model and deployed using Gradio.
Ho Chi Minh City University of Science
Image and Video Processing Advanced - Assoc. Prof. Lý Quốc Ngọc
K31 - Master of Science - Group: CHOICES
No. | Student ID | Student Name |
---|---|---|
1 | 19127027 | Võ Hoàng Bảo Duy |
2 | 19127094 | Phạm Ngọc Thiên Ân |
3 | 19127292 | Nguyễn Thanh Tình |
- Clone repository.
git clone https://github.com/ngthtinh99/ImageCaptioning.git
- Install Python (Python 3.7 - 3.9 is required for supporting Pytorch).
- Install necessary libraries.
pip install requests torch torchvision gradio timm fairscale transformers
- Run the deploy, the first time downloading the model would take about 5 minutes, the next time would not need to reload.
python app.py
- Browse the deploy on Localhost via the link http://localhost:7860, or the Public link generated in Command prompt.
- Enjoy 🙂
[1] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.
[2] Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi, BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.
[3] Gradio: Build Machine Learning Web Apps — in Python.