tatsu-lab/alpaca_farm

A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.

PythonApache-2.0

Issues

question for the dataset and training with rank-list data
#94 opened 2 months ago by Hyfred
0
shape error: 32000 vs 32001
#93 opened 4 months ago by yiwan-rl
0
Inquiry Regarding Supervised Fine-Tuning with AlpacaFarm Framework for Pythia Models
#91 opened 8 months ago by hank0316
1
Using pretrained models
#90 opened 9 months ago by syleedandekar
3
Error downloading pre-trained weights
#89 opened 9 months ago by syleedandekar
0
Reproducibility of pretuned reward model
#68 opened 10 months ago by jp7c5
0
BaseAnnotator.__init__() got an unexpected keyword argument 'other_keys_to_keep'
#85 opened 10 months ago by xukefaker
4
Repeated Deprecation Error
#87 opened 10 months ago by syleedandekar
3
PairwiseAutoAnnotator always "Annotating 0 examples with gpt4_3"
#86 opened 10 months ago by CharlesQ9
1
Why use FSDP instead of Deepspeed?
#76 opened a year ago by nrailg
2
[Bug] Error importing is_deepspeed_zero3_enabled
#65 opened a year ago by SingL3
1
lower cuda version?
#77 opened a year ago by dorothee-sigg
1
score with reward model
#79 opened a year ago by qlduNLP
2
Possible issue with gradient accumulation
#57 opened a year ago by rosinality
3
RecursionError: maximum recursion depth exceeded
#80 opened a year ago by qlduNLP
0
integrity_check error with sft10k
#78 opened a year ago by bpucla
0
Use of decapoda-research/llama-7b-hf checkpoint for the LLaMa-7B
#41 opened a year ago by twidddj
10
RewardModel.from_pretrained() loads redundant weights (incurs extra ~30GB of RAM)
#75 opened a year ago by angie-chen55
0
Confusing detail preference mapping
#71 opened a year ago by dorothee-sigg
1
Problem with PairwiseAutoAnnotator
#64 opened a year ago by langhaobeijing
4
Huge memory demand of recover_model_weights.py?
#73 opened a year ago by mensch72
0
recover_model_weights.py gives WARNING:root:Your base LLaMA checkpoint is converted with transformers==4.27.0.dev0
#72 opened a year ago by mensch72
0
KeyError: 'llama' in /recover_model_weights.py
#70 opened a year ago by mensch72
2
tried to use bnb, QLora on SFT but have errors
#63 opened a year ago by qiuruiyu
3
Running PPO with fewer GPUs
#50 opened a year ago by shunzh
3
Use with Llama-2-70b-hf?
#69 opened a year ago by mensch72
4
Question about KL term
#67 opened a year ago by SimengSun
1
Problem with Simulation case
#61 opened a year ago by qiuruiyu
2
model selection of PPO in Table 2
#60 opened a year ago by langhaobeijing
1
[Discussion] Adding more diverse annotators representing subpopulations?
#55 opened a year ago by mensch72
1
[Discussion] about compute_logprobs
#56 opened a year ago by snowkcon
0
Alternatively use Claude for annotation?
#54 opened a year ago by mensch72
4
code question for compute_loss in ppo_trainer
#52 opened 2 years ago by Yuhuajoe
4
[tokenization] preprocessing inputs and labels
#51 opened 2 years ago by ArashAhmadian
2
`_name_or_path` is not stored in llama config.json any more
#34 opened 2 years ago by nrailg
2
Generation Issue (probability tensor contains either `inf`, `nan` or element < 0) of Flash-LLaMA with Model Parallelism
#48 opened 2 years ago by Zhiyuan-Zeng
6
cannot import name 'PairwiseAutoAnnotator' from partially initialized module 'alpaca_farm.auto_annotations' (most likely due to a circular import)
#46 opened 2 years ago by simplelifetime
1
Differences in results between the code and the leaderboard
#45 opened 2 years ago by yifan123
1
[Reward Model Training] Inconsistent accuracy caused by flash-attention
#31 opened 2 years ago by nbl97
1
Can you provide another graph of reward model over-optimization in Figure.5 of the paper?
#33 opened 2 years ago by twidddj
1
Where is auto_annotations/annotators/annotator_pool_v0/configs.yaml ?
#30 opened 2 years ago by HaSai666
1
Differences in results between the paper and the code
#32 opened 2 years ago by idanshen
1
recover_model_weight on reward-sim meet problem of _name_or_path and backbone_model_name_or_path
#28 opened 2 years ago by REIGN12
3