iu-babylm: A Python repository from linkuew

A general setup which can be used for the baby-lm project found here: https://babylm.github.io/index.html

General information

Do not attempt to upload the babylm dataset, instead, copy it from the paper onto your machine/cloud system from the following URL: https://github.com/babylm/babylm.github.io/raw/main/babylm_data.zip
Keep all datasets inside of the 'dataset' directory

The general structure:

Clone this repo
Clone the evaluation pipeline repo somewhere else (https://github.com/babylm/evaluation-pipeline)
Using a python venv, install the necessary packages for creating our model
Train a model on the data
Save the model and the model tokenizer
Using a python venv, run the evaluation pipeline on the data

linkuew/iu-babylm