sinanw/llm-security-prompt-injection

This project investigates the security of large language models by performing binary classification of a set of input prompts to discover malicious prompts. Several approaches have been analyzed using classical ML algorithms, a trained LLM model, and a fine-tuned LLM model.

Jupyter NotebookMIT

sinanw/llm-security-prompt-injection

Stargazers