Templated code to use to train transformer models on new data. Just pick a new model, process data, handle data loading and tokenization.
The code was tested on predicting gradients of a math equation. Data being private, cannot reveal the final results but goes accuracy up to 70% on exact matches on really small models.