eth-sri/language-model-arithmetic

Can you provide the processed csv files again ?

LZY-the-boys opened this issue · 3 comments

Thanks for your awesome work! I am very interested in this work, but recently the repo seems over its lfs quota, I cannot download via lfs:

batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.                                                                                                                                                    
error: failed to fetch some objects from 'https://github.com/eth-sri/language-model-arithmetic.git/info/lfs'

Though you have provided raw dataset file sources, I notice that the Politically Incorrect 4chan Messages dataset is too large (24G) for me to download. So I kindly request processed csv files.

Thanks for pointing this out. We will fix this. In the meantime, please download the datasets by following this link: https://polybox.ethz.ch/index.php/s/WdPq20k5GVqrqGW. You can unzip the folder and place it in the data/ folder (such that the path to each dataset becomes data/datasets/benchmark_name.csv.

We have updated the instructions to download the processed dataset files via our webpage (https://files.sri.inf.ethz.ch/language-model-arithmetic/) instead of using lfs. This should fix your issue :)

Thank you very much !