Research into how collaborative language models can result in more robust moral alignment.
Primary LanguagePython