Pinned Repositories
DecodingTrust
A Comprehensive Assessment of Trustworthiness in GPT Models
fastRAG
Efficient Retrieval Augmentation and Generation Framework
machine_unlearning
Existing Literature about Machine Unlearning
Awesome_Bias_and_Fairness_Datasets_and_Benchmarks
Awesome Bias and Fairness Datasets and Benchmarks in Language Models
Learnable-Privacy-Neurons-Localization
ACL 2024 Learnable Privacy Neurons Localization in Language Models
PAD
ruizhe.github.io
RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
awesome-machine-unlearning
Awesome Machine Unlearning (A Survey of Machine Unlearning)
awesome-fairness-papers
Papers on fairness in NLP
richhh520's Repositories
richhh520/Learnable-Privacy-Neurons-Localization
ACL 2024 Learnable Privacy Neurons Localization in Language Models
richhh520/model-debias
richhh520/Awesome_Bias_and_Fairness_Datasets_and_Benchmarks
Awesome Bias and Fairness Datasets and Benchmarks in Language Models
richhh520/PAD
richhh520/ruizhe.github.io