For building models only
- The LLama2 model needs access permissions to their repository. Hence, you need to have a Hugging Face account with an access token (can be created here). Fill out this Meta-AI form and request permission to their models here.
- The Alpaca model requires an Chatnoir API token which can be requested here.
Then
make auth
This will also download the dataset
make clean install
git submodule init
git submodule update
- Configure which dataset, model and prompt type should be used
make configure # creates run.yml
- Activate the virtual environment and run the experiment.
source venv/bin/activate python src/python/generate_followup_questions.py --config run.yml
This computation may take a while depending on your hardware. A GPU is preferred for this experiment.
source venv/bin/activate
python src/python/compute_automatic_comparison.py
source venv/bin/activate
python src/python/compute_human_assessment.py
source venv/bin/activate
python src/python/compute_user_model.py
source venv/bin/activate
python src/python/compute_leading_bigrams_frequency.py
# setup
Rscript -e 'dir.create(Sys.getenv("R_LIBS_USER"), showWarnings=FALSE);install.packages("irr", lib=Sys.getenv("R_LIBS_USER"))'
# run
cat data/corpus-webis-follow-up-questions-24/simulation-annotations.json.gz \
| gunzip \
| python3 src/python/parse-label-studio-human-assessment-for-kappa.py /dev/stdin \
| sed 's/not_generic/specific/' \
> data/simulation-single-annotations.tsv
./src/r/kappa.R data/simulation-single-annotations.tsv
Edit src/python/save.py
CUDA_VISIBLE_DEVICES="" HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 python src/python/save.py
In llama.cpp:
python convert-hf-to-gguf.py <model directory>