
Code and data of our AAAI2021 paper "A Case Study of the Shortcut Effects in Visual Commonsense Reasoning"

Primary LanguagePython



Code and data of our AAAI2021 paper "A Case Study of the Shortcut Effects in Visual Commonsense Reasoning"


  • Python 3.6.4
  • Python packages pip install -r requirements.txt
  • Download and unzip the VCR annotations to data/vcr1annots
  • Download and put the VCR images zip file to data/vcr1images.zip

How simple to beat the random baseline

Please refer to our cheat.py. We count the overlap between the question and answer options and achieved a 29% accuracy (v.s. Random Guess 25%). Moreover, we found that more than 65% of the correct options have the most overlapped tags. It means that the connection between the questions and answer options provides a strong hint regarding the correct answer. Thus, model could find the connection to answer questions, without performing true reasoning.

Validation data for verifying the shortcuts

We provide the validation data to verify the shortcut effects. Please refer to the links below to download the data. The methodology to generate these data are mentioned in our paper section "Methods to Evaluate the Shortcut Effects". We also provide more details below.

Setting (link) Count Used in
Rule-based Modified Rule-Singular 16,154 Paper Table 3, Row 2
Rule-Plural 3,657 Paper Table 3, Row 3
Adversarially Modifed AdvTop-1 26,534 Paper Table 4, Column 4
KeepTop-1 26,534 Paper Table 4, Column 5
KeepTop-3 26,534 Paper Table 4, Column 6
KeepTop-5 26,534 Paper Table 4, Column 7

Rule-based modification

This method targets on how much different methods rely on person tag shortcuts. Please refer to the code under the "tools" directory for the methods' details. rephrase_choice_singular.py and rephrase_choice_plural.py generate the Rule-Singular and Rule-Plural validation data, respectively. To run them, just type python rephrase_choice_singular.py and python rephrase_choice_plural.py with default arguments.

Adversarial modification

This method looks for the token to be masked to maximize the cross-entropy loss.

We use shortcut_main.py to score the effect of removing individual tokens in the answer/rationale. Then, we use format_adversarial_annotations.py to merge the results from both answering model and rationale model. Finally, we use merge_adversarial_annotations.py to generate the setting of AdvTop-1, KeepTop-1, KeepTop-3, KeepTop-5, which are used in our Table 4.

Score the effect of removing tokens

We need the answering/rationale models and encoded TF record files to infer the effect of removing individual tokens. To get the trained models and the TF record files, please refer to Model training. After training the model, we use the shortcut_main.py to score the effect of removing individual tokens in the answer/rationale. We provide two example usages here:

python tools/shortcut_main.py \
  --logtostderr \
  --model_dir "PATH_TO_ANSWERING_MODEL" \
  --pipeline_proto "PATH_TO_ANSWERING_MODEL/pipeline.pbtxt" \
  --output_jsonl_file "ANSWERING_SCORING.jsonl"
python tools/shortcut_main.py \
  --logtostderr \
  --model_dir "PATH_TO_RATIONALE_MODEL" \
  --pipeline_proto "PATH_TO_RATIONALE_MODEL/pipeline.pbtxt" \
  --output_jsonl_file "RATIONALE_SCORING.jsonl" \

We then use the format_adversarial_annotations.py to merge the two output jsonl files. Simply run python tools/format_adversarial_annotations.py. We have uploaded the final result scoring file to data/adversarial_based/shortcut_scores.jsonl.

Generate adversarial settings

Here is an example pipeline to generate AdvTop-1 setting. We assume the original VCR validation data is located at data/vcr1annots/val.jsonl and the scoring of the shortcut effects is located at data/adversarial_based/shortcut_scores.jsonl. The following command shall generate the data/adversarial_based/val_adv_rmtop1.jsonl.v2 file required for the AdvTop-1 setting. The same tool can be used to generate the other adversarial settings, type python tools/merge_adversarial_annotations.py --help to see all options.

python "tools/merge_adversarial_annotations.py" \
  --logtostderr \
  --annotations_jsonl_file "data/vcr1annots/val.jsonl" \
  --adversarial_annotations_jsonl_file "data/adversarial_based/shortcut_scores.jsonl" \
  --output_jsonl_file "data/adversarial_based/val_adv_rmtop1.jsonl.v2" \
  --name "remove_shortcut"

Qualitative examples of the validation data

[val-54] Where is [2] going ?

original rule-singular adv-rmtop1
[2] is going into the store .
[2] is getting into a carriage .
[1] is going to the bathroom .
[1] is going outside to play after the conversation with [2] is over .
He is going into the store .
[2] is getting into a carriage .
[2] is going to the bathroom .
[1] is going outside to play after the conversation with [2] is over .
[MASK] is going into the store .
[2] is getting into a [MASK] .
[MASK] is going to the bathroom .
[1] is [MASK] outside to play after the conversation with [2] is over

[val-270] What are [1,2] feeling ?

original rule-plural adv-rmtop1
[1,2] do not like the restaurant .
They are apprehensive .
They are both feeling happy .
[1,2] are feeling drunk .
[1,2] do not like the restaurant .
[1,2] are apprehensive .
They are both feeling happy .
[1,2] are feeling drunk .
[1,2] do not like the [MASK] .
They are apprehensive [MASK]
They are [MASK] feeling happy .
[1,2] are feeling [MASK] .

[val-0] How is [1] feeling ?

original rule-singular adv-rmtop1
[1] is feeling amused .
[1] is upset and disgusted .
[1] is feeling very scared .
[1] is feeling uncomfortable with [3] .
[1] is feeling amused .
He is upset and disgusted .
[1] is feeling very scared .
[1] is feeling uncomfortable with [3] .
[1] is feeling [MASK] .
[MASK] is upset and disgusted .
[1] is feeling very [MASK] .
[1] is feeling [MASK] with [3] .

[val-22] How does [1] feel right now ?

original rule-singular adv-rmtop1
[1] is tired and concerned .
[1] feels admonished .
[2] is a bit upset .
[1] is aroused and happy .
He is tired and concerned .
[1] feels admonished .
[1] is a bit upset .
[1] is aroused and happy .
[MASK] is tired and concerned .
[1] feels [MASK] .
[MASK] is a bit upset .
[1] is aroused and [MASK] .

[val-171] What do [2,3] appear to be doing ?

original rule-plural adv-rmtop1
[2,3] are opening the ambulance doors to assist someone .
[2,3] are dancing .
They are working as security for the building .
They are hiding among [1,4] .
[2,3] are opening the ambulance doors to assist someone .
They are dancing .
[2,3] are working as security for the building .
[2,3] are hiding among [1,4] .
[2,3] are opening the [MASK] doors to assist someone .
[2,3] are [MASK] .
They are working as security for the [MASK] .
They are [MASK] among [1,4] .

[val-391] What is [1] planning on doing ?

original rule-singular adv-rmtop1
She wants to go to sleep .
She is planning to get on [6] .
[1] is looking to punch [2] in her arm .
She is going to play a game with the other children .
[1] wants to go to sleep .
She is planning to get on [6] .
[1] is looking to punch [2] in her arm .
[1] is going to play a game with the other children .
She wants to go to [MASK] .
[MASK] is planning to get on [6] .
[1] is looking to [MASK] [2] in her arm .
She is going to play a game with the other [MASK] .

Model training


Our paper

If you found this repository useful or used our data for evaluation, please cite our paper

  author = {Ye, Keren and Kovashka, Adriana},
  title = {A Case Study of the Shortcut Effects in Visual Commonsense Reasoning},
  booktitle = {Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI)},
  month = {February},
  year = {2021}