/ImageCaptioning

Primary LanguageJupyter Notebook

ImageCaptioning

This repo contains an implementation of a ImageCaptioning model. It was implemented as a part of 4 ECTS course Deep Learning of the Data Science Bachelor at the FHNW.

Architecture

The architecture is basically as follows:

  • A pretrained CNN-model (e.g. ResNet50) is used to generate features from the images.
  • With the help of an embedding, the dimension is adapted to the vocab size and the embedding dimension is selected based on available computing resources. Technically, a higher dimension should be better but it takes longer to train and requires more resources.
  • This vector is then passed as the first hidden state in a LSTM.

Futher details

Please have a look at main.ipynb