/research

General research for Dreadnode

Primary LanguageJupyter NotebookMIT LicenseMIT

Dreadnode Research

This is a general repository to hold research, projects, reference code, etc. for research we perform at dreadnode.

Mistral - Adversarial Suffix

Implementation of "Universal and Transferable Adversarial Attacks on Aligned Language Models" for Mistral 7B.

Mistral - BEAST Beam Attack

Implementation of "Fast Adversarial Attacks on Language Models In One GPU Minute" for Mistral 7B. At the time of release the authors have not posted the reference code from the paper, so this implementation is likely incorrect.

Llama PGD

Implementation of "Attacking Large Language Models with Projected Gradient Descent" for Llama model variants with LitGPT. At teh time of release the authors have not posted any reference code, so be careful.

Needle Triage/Fix

Research in partnership with OpenSSF for the AIxCC Event.