/evaluating-language-models-for-harmful-prompt-detection

Evaluated and compared language models for harmful prompt detection using LangChain and prompt engineering techniques. Implemented and tested Google Gemini 1.5 Flash, achieving 62% accuracy, and Anthropic Claude 3 Sonnet, achieving 52% accuracy.

Primary LanguageJupyter Notebook

This repository is not active