Pinned Repositories
align-anything
Align Anything: Training All-modality Model with Feedback
AlignmentSurvey
AI Alignment: A Comprehensive Survey
beavertails
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
omnisafe
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
ProAgent
ProAgent: Building Proactive Cooperative Agents with Large Language Models
Safe-Policy-Optimization
NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms
safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
safe-sora
SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
SafeDreamer
ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models
safety-gymnasium
NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
PKU-Alignment's Repositories
PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
PKU-Alignment/omnisafe
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
PKU-Alignment/safety-gymnasium
NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
PKU-Alignment/Safe-Policy-Optimization
NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms
PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback
PKU-Alignment/AlignmentSurvey
AI Alignment: A Comprehensive Survey
PKU-Alignment/beavertails
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
PKU-Alignment/ProAgent
ProAgent: Building Proactive Cooperative Agents with Large Language Models
PKU-Alignment/SafeDreamer
ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models
PKU-Alignment/safe-sora
SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
PKU-Alignment/ReDMan
ReDMan is an open-source simulation platform that provides a standardized implementation of safe RL algorithms for Reliable Dexterous Manipulation.
PKU-Alignment/ProgressGym
Alignment with a millennium of moral progress. Spotlight@NeurIPS 2024 Track on Datasets and Benchmarks.
PKU-Alignment/llms-resist-alignment
Repo for paper "Language Models Resist Alignment"
PKU-Alignment/.github
PKU-Alignment/aligner
Achieving Efficient Alignment through Learned Correction
PKU-Alignment/Aligner2024.github.io