TensorFlow-Bros-Final-Project

This project hopes to accomplish two primary goals. As we already mentioned in the reintroduction, our first goal is to learn more about Automatic Image Captioning by extensively researching, studying, and replicating the pipeline defined in various public notebooks. We believe that carefully walking through this process is a fulfilling exploration on its own and that there is much to be learned from understanding the best available code for a task as complicated as molecular translation. After reading several notebooks, identifying common obstacles, and closely investigating the most successful pipelines, we decided to use Mark Wijkuizen’s notebook on Kaggle as a focal point for our project. His process was not only comprehensive but also well suited to our background since it was based in TensorFlow and used various techniques that were introduced in class. Furthermore, his pipeline offered a healthy balance between familiar concepts that we hope to gain a deeper understanding of and completely new techniques that we hope to learn from scratch. Our second goal is to use concepts and techniques discussed in class to make appreciable improvements to our base model.

In this repository you will find 3 notebooks:

  1. Exploratory Data Analysis of the molecules and conversion to TFRecords Format (using Mark's notebook)
  2. Our base model (using Mark's notebook)
  3. Our final model that includes a couple notable improvements to our base model.

Credit to items 1 and 2 are attributed to Mark Wijkhuizen. A link to his brilliant work is included here: https://www.kaggle.com/markwijkhuizen/tensorflow-tpu-training-baseline-lb-16-92