CAPT-IMAGE is an advanced image captioning tool that generates descriptive captions for images using deep learning techniques. It utilizes a combination of VGG16 for feature extraction and LSTM for generating meaningful captions.
- Image Upload: Supports various formats including JPG, PNG, WEBP, and SVG.
- Caption Generation: Provides descriptive captions based on the content of the uploaded image.
- User-Friendly Interface: Built with Streamlit for a seamless and interactive user experience.
- Clone the repository:
git clone https://github.com/miteshgupta07/Capt-Image.git
- Navigate to the project directory:
cd Capt-Image
- Install the required packages:
pip install -r requirements.txt
- Run the Streamlit app:
streamlit run app.py
- Upload an image and view the generated caption.
• app.py: Main Streamlit application file.
• vgg_model.keras: Pre-trained VGG16 model used for feature extraction.
• model.keras: LSTM model used for caption generation.
• Tokenizer.pkl: Tokenizer object for text processing.
• Extracted_Feature.pkl: Extracted Feature File.
The dataset used for training the models is the Flickr8k dataset, which includes a collection of images with corresponding captions. The dataset is publicly available and can be accessed here.
• Feature Extraction Model: VGG16
• Caption Generation Model: LSTM with dense layers
Contributions are welcome! Please open an issue or submit a pull request if you have suggestions or improvements.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or feedback, feel free to reach out at miteshgupta2711@gmail.com.