Inspired by the OpenBookQA dataset this competition challenges participants to answer difficult science-based questions written by a Large Language Model.
To make it simple the idea was that the dataset for this Kaggle competition was created from GPT-3.5 which is a 175-billion-parameter model. It would be amazing if an LLM which is smaller than its size probably in the range of 7-30 Billion parameters using quantization techniques like QLORA and LORA could solve the exam setup by another LLM.
- Load and preprocess the competition data 📁
- Engineer relevant features for model training 🏋️♂️
- Train predictive models to make target variable predictions 🧠.
- Submit predictions to the competition environment 📤
- Data Preparation: In this section, we load and preprocess the competition data.
- Feature Engineering: We generate and select relevant features for model training.
- Model Training: We train machine learning models on the prepared data.
- Prediction and Submission: We make predictions on the test data and submit them for evaluation.
Disclaimer - To run this notebook you would need proper access to competition data for LLM Science Exam