/lm_tab

Primary LanguageJupyter Notebook

Code for the paper Vectorizing string entries for data processing on tables: when are larger language models better?

Data

Edited datasets can be downloaded at https://figshare.com/articles/dataset/Datasets_with_text_entries/24879042 Links for original datasets can be found in the paper.

Reproducing results

Computations can be launched using the files in the scripts directory starting by launch_ and encode. These scripts are made for a SLURM cluster and should be adapted to your cluster (by changing the executor settings)

Reproducing figures

Figures can be reproduced using the final_plot.ipynb notebook in the scripts folder.