Kaggle - LLM Science Exam
Use LLMs to answer difficult science questions
![](https://private-user-images.githubusercontent.com/36858976/261635496-ffc86657-8a3f-4b7b-8e09-a9d0841fbf2a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDU0NzI2NzcsIm5iZiI6MTcwNTQ3MjM3NywicGF0aCI6Ii8zNjg1ODk3Ni8yNjE2MzU0OTYtZmZjODY2NTctOGEzZi00YjdiLThlMDktYTlkMDg0MWZiZjJhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAxMTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMTE3VDA2MTkzN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTAyYTViNWM3Y2FhY2MzZTk0NTYwOTZlYjBhYmJlMzM1YTdjYjExOWIxMmJkNDAzYTdhNmEwNzk2YWY2MGUwYjImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.6H6lQ6tclfXsmYPwsYLHRPRQeWdlRzVBfIwNjJl5iW0)
Background
There are already some excellent notebooks demonstrating the use of HuggingFace's AutoModelForMultipleChoice
for MultipleChoice tasks in Kaggle - LLM Science Exam competition. However, it is challenging to comprehend the underlying mechanisms inside the model. This led me to create this notebook, which is centered around building a MultipleChoice model from the ground up, using the standard classifier from KerasNLP. In this notebook, I also use the multi-backend KerasCore alongside KerasNLP.
Furthermore, as time progresses, it's likely that larger datasets will become available, in which TPUs will be invaluable for training large models on these large datasets.
Kaggle Notebooks
- training: LLM Science Exam: KerasCore + KerasNLP [TPU]
- inference: LLM Science Exam: KerasCore + Keras [Infer]
Note: Train and Inference notebooks are also available in the
notebooks
folder.
Model Architecture
In the image below, you'll find token_ids
on the left and corresponding padding_masks
on the right:
Augmentation
I also tried a fun augmentation, ShuffleOptions
. This approach involves shuffling the answer options of each question. For instance, options [A, B, C]
would be transformed into [C, A, B]
. The purpose behind this augmentation is to ensure that the model doesn't focus on the positions of the options.
Tracking with WandB
You can track the all experiments here
Known Issues
- Setting backend
tensorflow
leads to OOM in RAM which is very weird. You can solve it by either usingjax
backend or usingtf.keras
instead ofkeras
. - Currently
TPU
is throwing an error withtensorflow
. You can usejax
backend withkeras_core
to resolve this issue.