Pinned Repositories
TrickLLM
This repository contains the code for the paper "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks" by Abhinav Rao, Sachin Vashishta*, Atharva Naik*, Somak Aditya, and Monojit Choudhury, accepted at LREC-CoLING 2024
ABP
linc
🔗 LINC: Logical Inference via Neurosymbolic Computation [EMNLP2023]
PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
guidance
A guidance language for controlling large language models.
multi-armed-bandit
Play with the solutions to the multi-armed-bandit problem.
id-multi-label-hate-speech-and-abusive-language-detection
The Dataset for Multi Label Hate Speech and Abusive Language Detection in Indonesian Twitter
constraint_enforcing_reward
Code for the paper "A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers"
metaseq
Repo for external large-scale work
SachinVashisth.github.io
SachinVashisth's Repositories
SachinVashisth/metaseq
Repo for external large-scale work
SachinVashisth/SachinVashisth.github.io