/Image_Captioning

Image Captioning

Primary LanguageJupyter NotebookMIT LicenseMIT

Image Captioning with Neural Networks ๐Ÿ–ผ๏ธ๐Ÿค–

Python PyTorch License

Image Captioning with Neural Networks is a deep learning project that combines Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to generate captions for images automatically. This implementation utilizes a pre-trained ResNet model for image feature extraction and an LSTM network for generating textual descriptions of the images.

Features ๐ŸŒŸ

  • Utilizes a pre-trained ResNet-18 model for efficient image feature extraction.
  • Employs an LSTM network for generating descriptive captions based on image features.
  • Supports training with and without fine-tuning of the ResNet model.
  • Includes functionality for both training and testing the model with a custom dataset.
  • Visualizes training loss and sample predictions to assess model performance.

Setup and Installation ๐Ÿ› ๏ธ

  1. Clone the repository from GitHub.
  2. Navigate to the project directory.
  3. Install the required dependencies listed in the requirements.txt file.

Dataset ๐Ÿ“

The model is trained and tested on the Flickr8k dataset, which comprises 8,000 images each paired with five different captions. For the purpose of this project, the dataset is pre-processed to align with the model's requirements.

Training the Model ๐Ÿš€

Training the model involves executing the training script, which will start the training process and save the model weights periodically.

Testing the Model ๐Ÿงช

After training, the model's performance can be evaluated by executing the testing script, which generates captions for the images in the test dataset.

Results and Evaluation ๐Ÿ“Š

The model's performance can be evaluated based on the captions generated for the test images. A qualitative assessment involves comparing the predicted captions against the ground truth captions.

License ๐Ÿ“œ

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements ๐Ÿ™Œ

  • Thanks to the creators of the Flickr8k dataset for providing the resources necessary for training and testing the model.
  • PyTorch documentation for providing comprehensive guides and tutorials.

Notebook and Copyright

Open In Colab

@misc{MJImageCaptioning2023, author = {Mohammad Javad (MJ) Ahmadi}, title = {Image Captioning}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/MJAHMADEE/Image_Captioning}} }


For more information, please refer to the official repository.