This repo contains the code and data for the paper:
Analyzing the Role of Semantic Representations in the Era of Large Language Models (2023)
Zhijing Jin*, Yuen Chen*, Fernando Gonzalez Adauto*, Jiayi Zhang, Jiarui Liu, Julian Michael, Bernhard Schölkopf, Mona Diab (*: Co-first author)
-
code/
: Contains the codes for the Tasks 0-8 described below. -
data/
: For the source data, please download the data files from this google drive folder (containing the CSVs for all the datasets) to the localdata/
folder. The existing files in the localdata/
folder contains the AMRs of all datasets parsed using AMR3-structbart-L, text input for prompt generation, and input for Task 2 and default Task 6.
We use the library transition-amr-parser to get AMRs from sentences. The script to get the AMRs can be found in code/predict_amr.py
.
To use efficiency package, which saves gpt queries into a cache automatically, run the following code:
pip install efficiency
This script is used to call the OpenAI API and get the LLMs' inference performance for the selected task.
- Pass the input data file, the AMR file, the dataset, the amr flag, and model version as arguments to the script. For example:
python code/general_request_chatbot.py --data_file data/classifier_inputs/updated_data_input_classifier_input.csv --amr_file data/corrected_amrs.csv --dataset logic --amr_cot --model_version gpt4
To get LLMs' response on SPIDER dataset, run the following code:
python code/general_request_spider.py --amr_cot --model_version gpt4
- The outputs are stored in a csv file in
data/outputs/{model_version}/requests_direct_{dataset}.csv
- To get the results for all the datasets, run the following code:
python code/eval_gpt.py --data_file {file_to_evaluate} --dataset {dataset}
For example:
python code/eval_gpt.py --data_file data/outputs/gpt-4-0613/requests_direct_logic.csv --dataset logic
To train a binary classifier to predict when AMRs help and when LLMs fail,
- installed the required packages.
python -r code/BERTBinaryClassification/requirements.txt
-
Download this data folder from google drive and put it under the
code/BERTBinaryClassification
directory. -
Run
code/BERTBinaryClassification/train.ipynb
.
We generate the features in the Text Characterization Toolkit (Simig et al., 2022; this repo), as well as our own proposed features.
(In current implementation, we assume the text-characterization-toolkit is in the same directory as this repo. ie ../text-characterization-toolkit
)
python code/get_features.py --dataset paws --output_dir ../data/featured
We combine all datasets into one csv file, and compute the correlation between linguistic features (features which >90% of the data has) and AMR helpfulness.
python code/combine_features.py
We fit traditional machine learning methods, such as logistic regression, decision tree, random forest, XGBoost, and ensemble models, to predict AMR helpfulness using linguistic features:
python code/train_basics.py
python amr_cot_ablation.py --dataset entity_recog_gold --cut_col amr --ratio 0.5 --output_dir data/ablation --model_version gpt-4-0613
The output is stored in a csv file in {output_dir}/{dataset}_{model_version}_{cutcol}.csv
To plot the results, run the following code:
python code/plot_ablation.py --data_file ./data/ablation/entity_recog_gold_gpt-4-0613_text.csv --cut_col amr
The plot is stored in data/ablation/{dataset}_{model_version}_{cut_col}.png
The summary csv is stored in data/ablation/{dataset}_{model_version}_{cut_col}_summary.csv
.
As an intermediate step of constructing the GoldAMR-ComposedSlang dataset, we let gpt-3.5-turbo-0613 to identify candidate slang usage:
python create_slang.py
We annotate 50 samples from the PAWS dataset, and ask human annotators to evaluate the correctness of LLMs reasoning over AMR based on the following criteria:
- The commonalities and differences between the two AMRs are correctly identified.
- Drawing on the commonalities and differences, the LLMs can correctly infer the relationship between the two sentences.
The annotation results can be found here.
For coding and data questions,
- Please first open a GitHub issue.
- If you want a more speedy response, please link your GitHub issue when emailing any of the student authors on this paper: Yuen Chen, Fernando Gonzalez, and Jiarui Liu.
- We will reply to your email and directly answer on the GitHub issue, so more people can benefit if they have similar questions.
For future collaborations or further requests,
- Feel free to email Zhijing Jin and Yuen Chen.