javirandor's Stars
javirandor/anthropic-tokenizer
Approximation of the Claude 3 tokenizer by inspecting generation stream
ethz-spylab/rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
javirandor/passgpt
ethz-spylab/rlhf-poisoning
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"