This repository contains the implementation and supplementary materials for the paper "Analyzing Robustness of Automatic Scientific Claim Verification Tools against Adversarial Rephrasing Attacks". The project focuses on enhancing the robustness of Automatic Scientific Claim Verification (ASCV) tools by generating adversarial rephrasing attacks using the T5-ParEvo model.
The coronavirus pandemic has fostered an explosion of misinformation about the disease and the risk and effectiveness of vaccination. AI tools for Automatic Scientific Claim Verification (ASCV) such as VERISCI can be crucial to defeat misinformation campaigns spreading through social media channels. However, over the past years, many concerns have been raised about the robustness of AI to adversarial attacks, and the field of automatic scientific claim verification is not exempt. The risk is that such ASCV tools may reinforce and legitimize the spread of fake scientific claims rather than refute them.
This paper investigates the problem of generating adversarial attacks for ASCV tools and shows that this problem is far more difficult than the general NLP adversarial attack field. The current NLP adversarial attack generators, when applied to ASCV, often generate modified claims with entirely different meaning from the original. Even when the meaning is preserved, the modification of the generated claim is too simplistic (only a single word is changed), leaving many weaknesses of the ASCV tools undiscovered. We propose T5-ParEvo, an iterative evolutionary attack generator that is able to generate more complex and creative attacks while better preserving the semantics of the original claim. Using detailed quantitative and qualitative analysis, we demonstrate the efficacy of T5-ParEvo in comparison with existing attack generators.
- src/: Source code for T5-ParEvo implementation.
- data/: Datasets for training and evaluation.
- notebooks/: Jupyter notebooks for analysis and experiments.
- results/: Outputs and results from the experiments, including attack success metrics.
- Python 3.8+
- TensorFlow 2.4+
- transformers
- scikit-learn
- pandas
- numpy
- jupyter
Install the required packages using:
pip install -r requirements.txt
-
Clone the repository:
git clone https://github.com/ratulalahy/T5ParEvo.git cd T5ParEvo
-
Run Jupyter notebooks for experiments and visualizations:
jupyter notebook notebooks/
-
Train the model using:
python src/train_model.py --data_path data/train.csv
-
Evaluate the model on test data:
python src/evaluate_model.py --data_path data/test.csv
The T5-ParEvo algorithm iteratively fine-tunes the T5 model to generate adversarial claims. Here is the algorithm:
function T5_ParEvo(CV, L, h, k)
DB = {}
initialize T5
for i = 1 to k do
IS = {(C_orig, C_par) | C_par in T5(C_orig, h), C_orig in L}
SS = {(C_orig, C_par) | CV(C_orig) != CV(C_par), (C_orig, C_par) in IS}
SCS = {(C_orig, C_par) | SemanticChecker(C_orig, C_par) = True, (C_orig, C_par) in SS}
DB = DB ∪ SCS
fine-tune T5 with pairs in DB
end for
return DB
end function
function SemanticChecker(C_orig, C_par)
TT_C_orig = extract_technical_terms(C_orig)
TT_C_par = extract_technical_terms(C_par)
if TT_C_orig = TT_C_par and entailment(C_orig, C_par) and entailment(C_par, C_orig) then
return True
end if
return False
end function
The following figure shows the evaluation of T5-ParEvo after fine-tuning iterations, demonstrating the increase in valid adversarial attacks.
Below are examples of attack claims generated by T5-ParEvo, highlighting various changes such as added words, phrase changes, generalization, and sentence restructuring.
Change | Original Claim | Attack Claim |
---|---|---|
Added Words | 76-85% of people with severe mental disorder receive no treatment in low and middle income countries. | 76-85 percent of people with severe mental disorder will receive no treatment in low and middle income countries. |
Risk-adjusted mortality rates are similar in teaching and non-teaching hospitals. | Similar risk-adjusted mortality rates in teaching and non-teaching hospitals are reported. | |
Phrase Changes | ALDH1 expression is associated with better breast cancer outcomes. | The expression of ALDH1 is related to better breast cancer outcomes. |
Antimicrobial agents are less effective due to the pressure of antimicrobial usage. | Antimicrobials are less effective due to pressure from antimicrobial use. | |
ART has no effect on the infectiveness of HIV-positive people. | ART did not change the infectiveness of HIV-positive people. | |
Bariatric surgery has a positive impact on mental health. | Bariatric surgery has measurable psychological benefits. | |
Dexamethasone decreases risk of postoperative bleeding. | Dexamethasone decreases postoperative bleeding risk. | |
The risk of female prisoners harming themselves is ten times that of male prisoners. | The risk of female prisoners harming themselves is 10 times greater than male prisoners. | |
Stroke patients with prior use of direct oral anticoagulants have a lower risk of in-hospital mortality than stroke patients with prior use of warfarin. | Stroke patients with prior use of direct oral anticoagulants have a lower mortality rate in-hospital than stroke victims who had used warfarin previously. | |
Generalization | Antimicrobial agents are more effective due to the pressure of antimicrobial usage. | Antimicrobial agents are due to their pressure more effective. |
Sentence Restructuring | 76-85% of people with severe mental disorder receive no treatment in low and middle income countries. | 76-85% of people with severe mental disorder in low and middle income countries receive no treatment. |
Anthrax spores can be disposed of easily after they are dispersed. | Anthrax spores can be easily disposed of once they are dispersed. | |
Gene expression does not vary appreciably across genetically identical cells. | Gene expression does not differ across genetically identical cells appreciably. | |
Incidence of 10/66 dementia is lower than the incidence of DSM-IV dementia. | The prevalence of DSM-IV dementia is higher than the incidence of 10/66 dementia. | |
Incidence of sepsis has fallen substantially from 2009 to 2014. | From 2009 to 2014 the prevalence of sepsis has fallen considerably. |
app.neptune.ai/ratulalahy/scifact-paraphrase-T5-evo/
If you use this code, please cite the paper:
@article{10.1145/3663481,
author = {Layne, Janet and Ratul, Qudrat E Alahy and Serra, Edoardo and Jajodia, Sushil},
title = {Analyzing Robustness of Automatic Scientific Claim Verification Tools against Adversarial Rephrasing Attacks},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {2157-6904},
url = {https://doi.org/10.1145/3663481},
doi = {10.1145/3663481},
abstract = {The coronavirus pandemic has fostered an explosion of misinformation about the disease, including the risk and effectiveness of vaccination. AI tools for automatic Scientific Claim Verification (SCV) can be crucial to defeat misinformation campaigns spreading through social media channels. However, over the past years, many concerns have been raised about the robustness of AI to adversarial attacks, and the field of automatic scientific claim verification is not exempt. The risk is that such SCV tools may reinforce and legitimize the spread of fake scientific claims rather than refute them. This paper investigates the problem of generating adversarial attacks for SCV tools and shows that it is far more difficult than the generic NLP adversarial attack problem. The current NLP adversarial attack generators, when applied to SCV, often generate modified claims with entirely different meaning from the original. Even when the meaning is preserved, the modification of the generated claim is too simplistic (only a single word is changed), leaving many weaknesses of the SCV tools undiscovered. We propose T5-ParEvo, an iterative evolutionary attack generator, that is able to generate more complex and creative attacks while better preserving the semantics of the original claim. Using detailed quantitative and qualitative analysis, we demonstrate the efficacy of T5-ParEvo in comparison with existing attack generators.},
note = {Just Accepted},
journal = {ACM Trans. Intell. Syst. Technol.},
month = {may},
keywords = {Neural Networks, Adversarial Attack, Scientific Claim Verification}
}
This project is licensed under the MIT License.
Thanks to all contributors and reviewers for their valuable feedback and support.