Pinned Repositories
gene-drive
a project to ensure that all child processes created by an agent "inherit" the agent's safety controls
honeypot
a project to detect environment tampering on the part of an agent
life-span
a project to ensure an artificial agent will eventually reach the end of its existence
mulligan
a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN
safe-reward
a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation
ai-fail-safe's Repositories
ai-fail-safe/safe-reward
a prototype for an AI safety library that allows an agent to maximize its reward by solving a puzzle in order to prevent the worst-case outcomes of perverse instantiation
ai-fail-safe/gene-drive
a project to ensure that all child processes created by an agent "inherit" the agent's safety controls
ai-fail-safe/honeypot
a project to detect environment tampering on the part of an agent
ai-fail-safe/life-span
a project to ensure an artificial agent will eventually reach the end of its existence
ai-fail-safe/mulligan
a library designed to shut down an agent exhibiting unexpected behavior providing a potential "mulligan" to human civilization; IN CASE OF FAILURE, DO NOT JUST REMOVE THIS CONSTRAINT AND START IT BACK UP AGAIN