access MTEB dataset without internet access
jiatong-yu opened this issue · 2 comments
When following the mteb script here to run on clusters without internet access, it throws the following error:
ConnectionError: Couldn't reach 'dwzhu/LongEmbed' on the Hub (ConnectionError)
Can you help with this?
Hi @jiatong-yu, thank you for your interest in our work!
Take the QMSum task in LongEmbed as an example, it is loaded here. So for clusters without internet access, I guess you can upload the dataset to the clusters manually, install MTEB in editable mode, and change the data loading logic to use local files.
Hope this can help! I will also provide some code snippets for loading from local directories tomorrow 😀
Suppose you have manually downloaded the LongEmbed benchmark from HF hub and uploaded it to the clusters (let's assume this path to be path_to_longembed
, such as /home/data/LongEmbed
). Take QMSum as an example, you can install MTEB in editable mode, and replace the line "path": "dwzhu/LongEmbed",
here into "path": "path_to_longembed",
to get rid of internet access.