Inspired by the OpenBookQA dataset this competition challenges participants to answer difficult science-based questions written by a Large Language Model.

To make it simple the idea was that the dataset for this Kaggle competition was created from GPT-3.5 which is a 175-billion-parameter model. It would be amazing if an LLM which is smaller than its size probably in the range of 7-30 Billion parameters using quantization techniques like QLORA and LORA could solve the exam setup by another LLM.

Purpose of the Notebook

  1. Load and preprocess the competition data 📁
  2. Engineer relevant features for model training 🏋️‍♂️
  3. Train predictive models to make target variable predictions 🧠.
  4. Submit predictions to the competition environment 📤

What to expect from this project

  1. Data Preparation: In this section, we load and preprocess the competition data.
  2. Feature Engineering: We generate and select relevant features for model training.
  3. Model Training: We train machine learning models on the prepared data.
  4. Prediction and Submission: We make predictions on the test data and submit them for evaluation.

Disclaimer - To run this notebook you would need proper access to competition data for LLM Science Exam