dwzhu-pku/LongEmbed

access MTEB dataset without internet access

jiatong-yu opened this issue · 2 comments

When following the mteb script here to run on clusters without internet access, it throws the following error:
ConnectionError: Couldn't reach 'dwzhu/LongEmbed' on the Hub (ConnectionError)
Can you help with this?

Hi @jiatong-yu, thank you for your interest in our work!
Take the QMSum task in LongEmbed as an example, it is loaded here. So for clusters without internet access, I guess you can upload the dataset to the clusters manually, install MTEB in editable mode, and change the data loading logic to use local files.
Hope this can help! I will also provide some code snippets for loading from local directories tomorrow 😀

Suppose you have manually downloaded the LongEmbed benchmark from HF hub and uploaded it to the clusters (let's assume this path to be path_to_longembed, such as /home/data/LongEmbed). Take QMSum as an example, you can install MTEB in editable mode, and replace the line "path": "dwzhu/LongEmbed", here into "path": "path_to_longembed", to get rid of internet access.