Kaggle - LLM Science Exam

Use LLMs to answer difficult science questions

Background

There are already some excellent notebooks demonstrating the use of HuggingFace's AutoModelForMultipleChoice for MultipleChoice tasks in Kaggle - LLM Science Exam competition. However, it is challenging to comprehend the underlying mechanisms inside the model. This led me to create this notebook, which is centered around building a MultipleChoice model from the ground up, using the standard classifier from KerasNLP. In this notebook, I also use the multi-backend KerasCore alongside KerasNLP.

Furthermore, as time progresses, it's likely that larger datasets will become available, in which TPUs will be invaluable for training large models on these large datasets.

Kaggle Notebooks

training: LLM Science Exam: KerasCore + KerasNLP [TPU]
inference: LLM Science Exam: KerasCore + Keras [Infer]

Note: Train and Inference notebooks are also available in the notebooks folder.

Model Architecture

In the image below, you'll find token_ids on the left and corresponding padding_masks on the right:

Augmentation

I also tried a fun augmentation, ShuffleOptions. This approach involves shuffling the answer options of each question. For instance, options [A, B, C] would be transformed into [C, A, B]. The purpose behind this augmentation is to ensure that the model doesn't focus on the positions of the options.

Tracking with WandB

You can track the all experiments here

Known Issues

Setting backend tensorflow leads to OOM in RAM which is very weird. You can solve it by either using jax backend or using tf.keras instead of keras.
Currently TPU is throwing an error with tensorflow. You can use jax backend with keras_core to resolve this issue.