lucidrains/PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
PythonMIT
Issues
- 0
- 0
How to use lora?
#58 opened by xiaoguzai - 13
- 0
Should critic's input be prompt only?
#57 opened by ginward - 8
- 1
- 0
Flash Attention 2
#54 opened by conceptofmind - 0
- 1
- 3
Model Name
#49 opened by conceptofmind - 2
Is it possible to replace PaLM with other huggingface pretrained language model?
#24 opened by noanti - 3
- 3
A few questions on training
#21 opened by TheRealAakash - 2
speed up with flash attn in A6000?
#47 opened by wac81 - 4
i use other params with palm, but got error
#45 opened by wac81 - 2
norm.gamma not used during backprop
#46 opened by conceptofmind - 5
Can we just replace PPO+RLHF with a preference models thats basically a transformer encoder + sigmoid model, trained with BCE. And during finetuning perform a reward maximization by just making the reward model predict 1s?
#12 opened by ssintelli - 1
- 3
Reason for using pooled critic embedding instead of the last embedding for value head
#42 opened by gblackout - 1
KL divergence loss
#38 opened by taynoel84 - 1
train your reward model issue
#37 opened by wac81 - 1
- 7
Is it possible to release a code based on jax?
#16 opened by sglucas - 2
mask raised error
#39 opened by gongel - 0
Value function
#35 opened by tonylin52 - 1
- 1
Do you need cuda for this?
#30 opened by beew - 0
Can we exploiting AGI ability of chatGPT ?
#32 opened by youkpan - 4
Is this shift right for the action logits?
#31 opened by kisseternity - 1
- 1
value function input
#28 opened by kkissmart - 2
The loss function of reward model.
#22 opened by huzechuan - 0
KL_div/ratio on policy
#26 opened by kkissmart - 39
Encoder-Decoder
#6 opened by Bachstelze - 0
How to fine-tune and train on my own data?
#20 opened by rbhatia46 - 8
Training the reward model
#19 opened by farhad-abdi - 4
PaLM-rlhf-pytorch Roadmap
#18 opened by HappyPony - 4
Help with computational power
#17 opened by byteunix - 1
- 2
Simple Web Interface
#15 opened by conceptofmind - 1
- 0
- 1
I'm dumb
#11 opened by cardonasMind - 1
Can I train a model on my own data?
#9 opened by sveisa - 2
- 3
GPU requirements
#5 opened by ejarkm