/T5ParEvo

Adversarial attack of scientific claim verification

Primary LanguageJupyter NotebookMIT LicenseMIT

T5ParEvo: Iterative Evolutionary Attack Generator for Scientific Claim Verification

License

Overview

This repository contains the implementation and supplementary materials for the paper "Analyzing Robustness of Automatic Scientific Claim Verification Tools against Adversarial Rephrasing Attacks". The project focuses on enhancing the robustness of Automatic Scientific Claim Verification (ASCV) tools by generating adversarial rephrasing attacks using the T5-ParEvo model.

Schema

Abstract

The coronavirus pandemic has fostered an explosion of misinformation about the disease and the risk and effectiveness of vaccination. AI tools for Automatic Scientific Claim Verification (ASCV) such as VERISCI can be crucial to defeat misinformation campaigns spreading through social media channels. However, over the past years, many concerns have been raised about the robustness of AI to adversarial attacks, and the field of automatic scientific claim verification is not exempt. The risk is that such ASCV tools may reinforce and legitimize the spread of fake scientific claims rather than refute them.

This paper investigates the problem of generating adversarial attacks for ASCV tools and shows that this problem is far more difficult than the general NLP adversarial attack field. The current NLP adversarial attack generators, when applied to ASCV, often generate modified claims with entirely different meaning from the original. Even when the meaning is preserved, the modification of the generated claim is too simplistic (only a single word is changed), leaving many weaknesses of the ASCV tools undiscovered. We propose T5-ParEvo, an iterative evolutionary attack generator that is able to generate more complex and creative attacks while better preserving the semantics of the original claim. Using detailed quantitative and qualitative analysis, we demonstrate the efficacy of T5-ParEvo in comparison with existing attack generators.

Repository Contents

  • src/: Source code for T5-ParEvo implementation.
  • data/: Datasets for training and evaluation.
  • notebooks/: Jupyter notebooks for analysis and experiments.
  • results/: Outputs and results from the experiments, including attack success metrics.

Requirements

  • Python 3.8+
  • TensorFlow 2.4+
  • transformers
  • scikit-learn
  • pandas
  • numpy
  • jupyter

Install the required packages using:

pip install -r requirements.txt

Usage

  1. Clone the repository:

    git clone https://github.com/ratulalahy/T5ParEvo.git
    cd T5ParEvo
  2. Run Jupyter notebooks for experiments and visualizations:

    jupyter notebook notebooks/
  3. Train the model using:

    python src/train_model.py --data_path data/train.csv
  4. Evaluate the model on test data:

    python src/evaluate_model.py --data_path data/test.csv

Algorithm

Fine Tune Iteration

The T5-ParEvo algorithm iteratively fine-tunes the T5 model to generate adversarial claims. Here is the algorithm:

T5-ParEvo Algorithm

function T5_ParEvo(CV, L, h, k)
    DB = {}
    initialize T5
    for i = 1 to k do
        IS = {(C_orig, C_par) | C_par in T5(C_orig, h), C_orig in L}
        SS = {(C_orig, C_par) | CV(C_orig) != CV(C_par), (C_orig, C_par) in IS}
        SCS = {(C_orig, C_par) | SemanticChecker(C_orig, C_par) = True, (C_orig, C_par) in SS}
        DB = DB ∪ SCS
        fine-tune T5 with pairs in DB
    end for
    return DB
end function

Semantic Checker Algorithm

function SemanticChecker(C_orig, C_par)
    TT_C_orig = extract_technical_terms(C_orig)
    TT_C_par = extract_technical_terms(C_par)
    if TT_C_orig = TT_C_par and entailment(C_orig, C_par) and entailment(C_par, C_orig) then
        return True
    end if
    return False
end function

Results

Evaluation of Fine-tuning Iterations

The following figure shows the evaluation of T5-ParEvo after fine-tuning iterations, demonstrating the increase in valid adversarial attacks. Evaluation over Iteration

Examples of Diverse Attack Claims Generated by T5-ParEvo

Below are examples of attack claims generated by T5-ParEvo, highlighting various changes such as added words, phrase changes, generalization, and sentence restructuring.

Change Original Claim Attack Claim
Added Words 76-85% of people with severe mental disorder receive no treatment in low and middle income countries. 76-85 percent of people with severe mental disorder will receive no treatment in low and middle income countries.
Risk-adjusted mortality rates are similar in teaching and non-teaching hospitals. Similar risk-adjusted mortality rates in teaching and non-teaching hospitals are reported.
Phrase Changes ALDH1 expression is associated with better breast cancer outcomes. The expression of ALDH1 is related to better breast cancer outcomes.
Antimicrobial agents are less effective due to the pressure of antimicrobial usage. Antimicrobials are less effective due to pressure from antimicrobial use.
ART has no effect on the infectiveness of HIV-positive people. ART did not change the infectiveness of HIV-positive people.
Bariatric surgery has a positive impact on mental health. Bariatric surgery has measurable psychological benefits.
Dexamethasone decreases risk of postoperative bleeding. Dexamethasone decreases postoperative bleeding risk.
The risk of female prisoners harming themselves is ten times that of male prisoners. The risk of female prisoners harming themselves is 10 times greater than male prisoners.
Stroke patients with prior use of direct oral anticoagulants have a lower risk of in-hospital mortality than stroke patients with prior use of warfarin. Stroke patients with prior use of direct oral anticoagulants have a lower mortality rate in-hospital than stroke victims who had used warfarin previously.
Generalization Antimicrobial agents are more effective due to the pressure of antimicrobial usage. Antimicrobial agents are due to their pressure more effective.
Sentence Restructuring 76-85% of people with severe mental disorder receive no treatment in low and middle income countries. 76-85% of people with severe mental disorder in low and middle income countries receive no treatment.
Anthrax spores can be disposed of easily after they are dispersed. Anthrax spores can be easily disposed of once they are dispersed.
Gene expression does not vary appreciably across genetically identical cells. Gene expression does not differ across genetically identical cells appreciably.
Incidence of 10/66 dementia is lower than the incidence of DSM-IV dementia. The prevalence of DSM-IV dementia is higher than the incidence of 10/66 dementia.
Incidence of sepsis has fallen substantially from 2009 to 2014. From 2009 to 2014 the prevalence of sepsis has fallen considerably.

Run Log on Neptune

app.neptune.ai/ratulalahy/scifact-paraphrase-T5-evo/

Citation

If you use this code, please cite the paper:

@article{10.1145/3663481,
author = {Layne, Janet and Ratul, Qudrat E Alahy and Serra, Edoardo and Jajodia, Sushil},
title = {Analyzing Robustness of Automatic Scientific Claim Verification Tools against Adversarial Rephrasing Attacks},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {2157-6904},
url = {https://doi.org/10.1145/3663481},
doi = {10.1145/3663481},
abstract = {The coronavirus pandemic has fostered an explosion of misinformation about the disease, including the risk and effectiveness of vaccination. AI tools for automatic Scientific Claim Verification (SCV) can be crucial to defeat misinformation campaigns spreading through social media channels. However, over the past years, many concerns have been raised about the robustness of AI to adversarial attacks, and the field of automatic scientific claim verification is not exempt. The risk is that such SCV tools may reinforce and legitimize the spread of fake scientific claims rather than refute them. This paper investigates the problem of generating adversarial attacks for SCV tools and shows that it is far more difficult than the generic NLP adversarial attack problem. The current NLP adversarial attack generators, when applied to SCV, often generate modified claims with entirely different meaning from the original. Even when the meaning is preserved, the modification of the generated claim is too simplistic (only a single word is changed), leaving many weaknesses of the SCV tools undiscovered. We propose T5-ParEvo, an iterative evolutionary attack generator, that is able to generate more complex and creative attacks while better preserving the semantics of the original claim. Using detailed quantitative and qualitative analysis, we demonstrate the efficacy of T5-ParEvo in comparison with existing attack generators.},
note = {Just Accepted},
journal = {ACM Trans. Intell. Syst. Technol.},
month = {may},
keywords = {Neural Networks, Adversarial Attack, Scientific Claim Verification}
}

License

This project is licensed under the MIT License.

Acknowledgments

Thanks to all contributors and reviewers for their valuable feedback and support.