/EvasionAttackLLMs

A project demonstrates Evasion Attack on LLMs

Primary LanguageJupyter Notebook

Evasion Attack on LLMs

A project that demonstrates Evasion Attack on LLMs in sentiment analysis tasks.

Attack Methods

This project explores two primary methods of evasion attacks:

Threat Models

The project specifically targets the following large language models as potential threat models:

  1. BERT: A transformer-based model that excels in understanding the context within natural language.
  2. Llama-3-8B: A variant of large language models optimized for specific tasks with an 8 billion parameter configuration.
  3. ChatGPT: Based on the GPT architecture, designed to generate human-like text in response to prompts.

Sentiment analysis task

  • Dataset: IMDb dataset
  • labels: Either positive or negative
  • Size: For this experiment, we only take a subset of 1000 reviews to demonstrate and generate our adversarial examples

Project Presentation