In-context learning for initial response generation, response self-refinement for improved specificity, and new user query generation, in the document-grounded setting. Currently includes the MultiDoc2Dial and AskHR datasets, and can be further extended to other content-grounded datasets, given they follow a standardized format consistent with the current datasets.
First, create a new conda environment, as follows:
conda create -y -p ./{env-name} python=3.10
conda activate {env-name}
Next, install the package requirements as provided in requirements.txt:
pip install -r requirements.txt
Querying models via BAM requires an API key, which can be obtained at https://bam.res.ibm.com/. Save this key in a file titled .env
as follows:
GENAI_KEY={your-api-key}
Finally, to set the PYTHONPATH, run the following command:
source setup.sh
To run single-turn response generation with iterative refinement, use the following command:
bash scripts/run_response_gen.sh
In scripts/run_response_gen.sh
, the dataset, model to be queried via BAM, the number of samples in the dataset to run, and the maximum number of refinement attempts can be modified.
Soon to be added: 'hugging-face' as a model source in addition to 'ibm-generative-ai'.
Observe that the only difference between the shell scripts for the following datasets is the dataset path -- hence, this can be adapted to other content-grounded datasets.
To run multi-turn dialogue generation with iterative refinement on MultiDoc2Dial:
bash scripts/run_sdg_md2d.sh
To run multi-turn dialogue generation with iterative refinement on AskHR:
bash scripts/run_sdg_askhr.sh