apartresearch
Artificial intelligence will change the world. Our mission is to ensure this happens safely and to the benefit of everyone.
Pinned Repositories
ai-psychology-starter
Code templates to get started as an AI psychologist
aisafetyideas
💡 The web app CI/CD for aisafetyideas.com
deepdecipher
🦠 DeepDecipher: An open source API to MLP neurons
evaluations-starter
How to get started in evaluations and demonstrations research for dangerous capabilities
Integer_Addition
✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks
interpretability-starter
🧠 Starter templates for doing interpretability research
mechanisticinterpretability
A repository for awesome resources in mechanistic interpretability
Neuron2Graph
Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
readingwhatwecan
📚📚📚📚📚📚📚📚📚 Reading everything
specificityplus
👩💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"
apartresearch's Repositories
apartresearch/interpretability-starter
🧠 Starter templates for doing interpretability research
apartresearch/specificityplus
👩💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"
apartresearch/Neuron2Graph
Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
apartresearch/Integer_Addition
✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks
apartresearch/readingwhatwecan
📚📚📚📚📚📚📚📚📚 Reading everything
apartresearch/deepdecipher
🦠 DeepDecipher: An open source API to MLP neurons
apartresearch/aisafetyideas
💡 The web app CI/CD for aisafetyideas.com
apartresearch/ai-psychology-starter
Code templates to get started as an AI psychologist
apartresearch/evaluations-starter
How to get started in evaluations and demonstrations research for dangerous capabilities
apartresearch/mechanisticinterpretability
A repository for awesome resources in mechanistic interpretability
apartresearch/Research-Augmentation-Hackbook
apartresearch/3cb
3cb: Catastrophic Cyber Capabilities Benchmarking of Large Language Models
apartresearch/AIS-cost-effectiveness
Cost-effectiveness models, tools, and results for various AI safety field-building programs.
apartresearch/Interpreting-Learned-Feedback-Patterns
✱ Interpreting learned feedback patterns in large language models
apartresearch/othelloscope
Interpretability Hackathon 2.0 entry
apartresearch/scheduling-widget
📆 Showcases specific times in local time zones
apartresearch/hackathon-utils
😎 Code to run hackathons efficiently
apartresearch/ICML2024MI
🌍 Website for NeurIPS2023MI
apartresearch/n2g
Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
apartresearch/paper-website
🌍 Website template for academic papers
apartresearch/scale-llm-24
🌍 Website for the Scaling Laws workshop
apartresearch/seqcont_circuits
✱ Interpreting how similar sequence continuation tasks share internal representations ✱
apartresearch/task-standard
🚨 METR Task Standard fork for the Code Red Hackathon
apartresearch/Verified_addition
apartresearch/.github
apartresearch/Apart-Evals
apartresearch/GPT-4-Chat-UI
GPT-4 frontend with open source Next.js template.
apartresearch/open
🌍 Repository to update our open data
apartresearch/team-sync-lab
apartresearch/town_hall_avatar
Uses ChatGPT to simulate a townhall discussion between avatars