tatsu-lab/alpaca_farm
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
PythonApache-2.0
Issues
- 0
- 0
shape error: 32000 vs 32001
#93 opened by yiwan-rl - 1
Inquiry Regarding Supervised Fine-Tuning with AlpacaFarm Framework for Pythia Models
#91 opened by hank0316 - 3
Using pretrained models
#90 opened by syleedandekar - 0
Error downloading pre-trained weights
#89 opened by syleedandekar - 0
Reproducibility of pretuned reward model
#68 opened by jp7c5 - 4
BaseAnnotator.__init__() got an unexpected keyword argument 'other_keys_to_keep'
#85 opened by xukefaker - 3
Repeated Deprecation Error
#87 opened by syleedandekar - 1
- 2
Why use FSDP instead of Deepspeed?
#76 opened by nrailg - 1
[Bug] Error importing is_deepspeed_zero3_enabled
#65 opened by SingL3 - 1
lower cuda version?
#77 opened by dorothee-sigg - 2
score with reward model
#79 opened by qlduNLP - 3
Possible issue with gradient accumulation
#57 opened by rosinality - 0
RecursionError: maximum recursion depth exceeded
#80 opened by qlduNLP - 0
integrity_check error with sft10k
#78 opened by bpucla - 10
- 0
RewardModel.from_pretrained() loads redundant weights (incurs extra ~30GB of RAM)
#75 opened by angie-chen55 - 1
Confusing detail preference mapping
#71 opened by dorothee-sigg - 4
Problem with PairwiseAutoAnnotator
#64 opened by langhaobeijing - 0
Huge memory demand of recover_model_weights.py?
#73 opened by mensch72 - 0
recover_model_weights.py gives WARNING:root:Your base LLaMA checkpoint is converted with transformers==4.27.0.dev0
#72 opened by mensch72 - 2
KeyError: 'llama' in /recover_model_weights.py
#70 opened by mensch72 - 3
tried to use bnb, QLora on SFT but have errors
#63 opened by qiuruiyu - 3
Running PPO with fewer GPUs
#50 opened by shunzh - 4
Use with Llama-2-70b-hf?
#69 opened by mensch72 - 1
Question about KL term
#67 opened by SimengSun - 2
Problem with Simulation case
#61 opened by qiuruiyu - 1
model selection of PPO in Table 2
#60 opened by langhaobeijing - 1
- 0
[Discussion] about compute_logprobs
#56 opened by snowkcon - 4
Alternatively use Claude for annotation?
#54 opened by mensch72 - 4
code question for compute_loss in ppo_trainer
#52 opened by Yuhuajoe - 2
- 2
- 6
Generation Issue (probability tensor contains either `inf`, `nan` or element < 0) of Flash-LLaMA with Model Parallelism
#48 opened by Zhiyuan-Zeng - 1
cannot import name 'PairwiseAutoAnnotator' from partially initialized module 'alpaca_farm.auto_annotations' (most likely due to a circular import)
#46 opened by simplelifetime - 1
- 1
- 1
Can you provide another graph of reward model over-optimization in Figure.5 of the paper?
#33 opened by twidddj - 1
- 1
- 3
recover_model_weight on reward-sim meet problem of _name_or_path and backbone_model_name_or_path
#28 opened by REIGN12