pengqiuluuu/EvasionAttackLLMs

A project demonstrates Evasion Attack on LLMs

Jupyter Notebook

Evasion Attack on LLMs

A project that demonstrates Evasion Attack on LLMs in sentiment analysis tasks.

Attack Methods

This project explores two primary methods of evasion attacks:

White-Box Attack: SALSA attack on BERT. For more details, refer to the paper available at SALSA: Salience-Based Switching Attack for Adversarial Perturbations in Fake News Detection Models.
Black-Box Attack: Prompt-based attack by ChatGPT. Further information can be found on ArXiv at An LLM can Fool Itself: A Prompt-Based Adversarial Attack.

Threat Models

The project specifically targets the following large language models as potential threat models:

BERT: A transformer-based model that excels in understanding the context within natural language.
Llama-3-8B: A variant of large language models optimized for specific tasks with an 8 billion parameter configuration.
ChatGPT: Based on the GPT architecture, designed to generate human-like text in response to prompts.

Sentiment analysis task

Dataset: IMDb dataset
labels: Either positive or negative
Size: For this experiment, we only take a subset of 1000 reviews to demonstrate and generate our adversarial examples

Project Presentation

Youtube：https://youtu.be/EH1s5jgB8Qc
Slides: Link