ILKT: A Python repository from barteksad

to run bash session on entropy cluster:

srun --partition=common --qos=1gpu4h --time=1:00:00 --gres=gpu:1 --pty /bin/bash

instalation on entropy cluster (on GPU node, not connect node):

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

benchmark installation:

cd benchmarks/pl-mteb
python -m venv env
source env/bin/activate
pip install -r requirements.txt
pip install scipy==1.10.1
pip install mteb==1.12.25
pip install pydantic==2.7.2

to configure wandb install it with pip and run

export WANDB_API_KEY=$(cat /path/to/secure/file)
wandb login $WANDB_API_KEY

to configure huggingface read/write install huggingface_cli with pip and run

git config --global credential.helper store
huggingface-cli login

to submit a training job:

sbatch slurm/entropy/run_train.sh

on entropy Titan-V

precision: "bf16-mixed"

won't work, it should be removed

to submit a benchmark job: edit model name in this file and run

sbatch slurm/entropy/run_pl_mteb.sh

Datasets types follows the one availabe here

AWS setup through sky-pilot

pip install "skypilot-nightly[aws]" boto3

And you need to create Access Key and have quota for spot instances with GPU!

to run pl-mteb with sky-pilot:

sky spot launch \
    --env WANDB_API_KEY=$WANDB_API_KEY \
    --env HF_TOKEN=$HF_TOKEN \
    --env MODEL_NAME=$MODEL_NAME \
    sky/sky_run_pl-mteb.yml

by default sky-pilot uses quite powerful instance as a controll node which in cases when you want to run one spot machine with GPU results in paying more for control node. It can be overriden:

~/.sky/config.yaml

jobs:
  controller:
    resources:
      cloud: aws
      region: us-east-1
      cpus: 2

barteksad/ILKT