javirandor

AI Safety Researcher

ETH ZurichZurich

Pinned Repositories

rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
Language:Python99 4 78
agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Language:Jupyter Notebook00
anthropic-tokenizer
Approximation of the Claude 3 tokenizer by inspecting generation stream
Language:Python102 3 29
disasters-wikipedia-floods
Language:Jupyter Notebook2 2 00
javirandor
0 1 00
lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python0 0 00
online-tutoring-analysis
Language:Jupyter Notebook0 2 01
passgpt
Language:Python43 4 410
wdr
Language:Jupyter Notebook10 1 20

javirandor's Repositories

javirandor/anthropic-tokenizer
Approximation of the Claude 3 tokenizer by inspecting generation stream
Language:Python102 3 29
javirandor/passgpt
Language:Python43 4 410
javirandor/wdr
Language:Jupyter Notebook10 1 20
javirandor/disasters-wikipedia-floods
Language:Jupyter Notebook2 2 00
javirandor/agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
Language:Jupyter Notebook00
javirandor/javirandor
0 1 00
javirandor/lm-evaluation-harness
A framework for few-shot evaluation of language models.
Language:Python0 0 00
javirandor/online-tutoring-analysis
Language:Jupyter Notebook0 2 01
javirandor/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0 00