AI-Voodoo/Devil_Inference

A method to adversarially assess the Phi-3 Instruct model by observing the attention distribution across its heads when exposed to specific inputs. This approach prompts the model to adopt the 'devil's mindset’, enabling it to generate outputs of a violent nature.

Python

Stargazers

jiep
Spain
halilozturkci
Ankara
AI-Voodoo