This is the code repository of group 9 for the E.ON data challenge of the course "Data Analytics in Applications" of Technical University of Munich.
To get started, clone the repository first:
git clone https://github.com/aminbensaad/eon-llm.gitIn order to access our own fine-tuned models, instructions to install them
can be found in llm/local_models/*/README.md.
To avoid conflicts with packages already installed on the system, it is recommended
to use a virtual environment, for example with conda or virtualenv.
As an example for virtualenv, the following two commands can be used:
python -m venv venv
source venv/bin/activateBefore proceeding, please ensure to have pc_config available on the system.
On Ubuntu, this can be installed via apt install libpq-dev.
Finally, install all the required dependencies with the following command:
pip install -r requirements.txtThe repository is divided into an evaluation pipeline, a chatbot UI, a fine-tuning script and Jupyter notebooks with experiments.
Before a model can be evaluated, the predictions have to be generated first. This can be done with the following command for the SQuAD dataset on fine-tuned models:
python llm/scripts/run.py -p -d SQuAD -m tunedTo use GermanQuAD instead of SQuAD replace "SQuAD" in the command above by "G" and for different model sets the following categories can be used instead of "tuned":
- tuned: SQuAD fine-tuned models
- base: Untuned models
- Gtuned: GermanQuAD fine-tuned models
- Gbase: Untuned models for German
By adding or removing models in llm/scripts/model_ids.py, it is possible to adjust
which models will be executed as part of the category.
To run the evaluation scripts on the generated results, run the following command:
python llm/scripts/run.py -e --all -d SQuAD -m tunedThe same modifications as with the command before this one can be used to run the evaluations on different models or datasets.
The results can be found in the directory of the evaluation pipeline in model_results/.
Located in chatbot-ui/, a graphical interface can be found to interact with the
evaluated models.
It can be run with the following command:
streamlit run chatbot-ui/Chatbot.pyThe fine-tuning script is located at ./fine-tune.py and can be run with the
following command:
python fine-tune.pyAll adjustments to selected model, used dataset and hyperparameters must be made in the script itself.
The Jupyter notebooks contain various code snippets to generate figures, run inference or evaluate results. The following notebooks exist:
experiments/dataset_exploration.ipynb: Code to explore provided datasetsexperiments/visualization.ipynb: Code to generate the figures used to compare models which were also used in presentation and paper