This is a pratical assignment for the course Web Data Processing Systems named IntelliVerify.
This program is designed to improve the performance of large language models such as LLaMA. Our main implementations include:
-
Given certain types of questions, parse the output of LLaMA and extract a clean answer;
-
With the aid of Wikipedia, link the answer to an entity and decide the answer's correctness.
To run this program locally, please follow the next procedure:
-
Use the Docker image loaded with 7B parameters llama2 model (provided).
- To download the docker image, type command:
docker pull karmaresearch/wdps2
- To run the docker image, type command:
docker run -ti karmaresearch/wdps2
- To download the docker image, type command:
-
Transmit the source code to the docker image.
-
Enter the root folder of IntelliVerify
-
Use the command
docker cp ./IntelliVerify $(ContainerID):/home/user/
-
Please remember to change the ContainerID to your local container ID with the help of
docker ps -a
.
-
Install the requirements.
- Go into the terminal of the docker, then
$cd ~/IntelliVerify
. - Install pipreqs via command
$pip install pipreqs
, then$pipreqs . --encoding=utf-8
. - Then
$pip install -r requirements.txt
.
- Go into the terminal of the docker, then
-
Download the dataset and pre-trained model for answer extracting.
- From here you can download the "model_squad.pt" file which is used for the answer extraction module.
Then, copy it into the folder of IntelliVerify:
docker cp /model_squad.pt $(ContainerID):~/IntelliVerify
. - From here you can download the "tokenizer" folder. First, under the
/home/user
path of your docker environment, please executemkdir nltk_data
. Then, please copy the tokenizer into the bash:docker cp /tokenizer $(ContainerID):/home/user/nltk_data
.
- From here you can download the "model_squad.pt" file which is used for the answer extraction module.
Then, copy it into the folder of IntelliVerify:
If there are problems with file permissions, use
sudo chmod
to adjust the permissions. We recommend you tosudo chmod 777 IntelliVerify
once and for all.
-
Execute the program to see the results.
- Type in your questions to
example_input.txt
. If you do not have specific questions to ask, you can also just use the default questions in this file. - Open the docker's terminal, then
$cd ~/IntelliVerify/
, then execute$python3 main.py
. - After the program is executed, the result of each input will be printed in the terminal. The answers will be written into the
example_output.txt
file.
- Type in your questions to
- Please kindly log in to the Google Drive path and download
newimage.tar
. - After that, type in
# docker load < newimage.tar
in your bash. You can use# docker images
to look for existing images. - If the repository name and tag is missing, please execute
# docker tag <container ID> <your_customized_name>:latest
. - Please run the image with
# docker run -itd <your_customized_name>:latest
.
- If appears error:
OSError: [E050] Can't find model 'en_core_web_sm'.
, then inputpython -m spacy download en_core_web_sm
. - If there are problems with file permissions, use
sudo chmod
to adjust the permissions. We recommend you tosudo chmod 777 IntelliVerify
once and for all. - If any other package fails to automatically download, please install manually using
pip install
. We know that can be tiresome :(.
---- IntelliVerify\
|---- README.md
|---- Similarity.py
|---- WikiReq.py
|---- access_llm.py
|---- example_input.txt
|---- extract_answer.py
|---- main.py
|---- ner.py
|---- output.txt
|---- requirements.txt