lucidrains/PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

PythonMIT

Issues

Is there any documentation to train this on my own data ?
#59 opened 3 months ago by gauravgandhi1315
0
How to use lora?
#58 opened 3 months ago by xiaoguzai
0
Confusion about KL divergence calculation for human feedback policies
#41 opened a year ago by dwyzzy
13
Should critic's input be prompt only?
#57 opened 6 months ago by ginward
0
✨ 😅 Is possibale to use the ChatGPT of OpenAI to train this ChatGPT?
#23 opened a year ago by Yonv1943
8
Possible incorrect creation of Rotary Embeddinigs
#56 opened 7 months ago by AndyBarcia
1
Flash Attention 2
#54 opened 10 months ago by conceptofmind
0
I looked at the llama source code and there is an intermedie layer
#51 opened a year ago by wac81
0
Column and Row Parallel Linear for Apex Tensor Parallel
#44 opened a year ago by conceptofmind
1
Model Name
#49 opened a year ago by conceptofmind
3
Is it possible to replace PaLM with other huggingface pretrained language model?
#24 opened a year ago by noanti
2
memory-efficient attention is default opened? if i dont use flash attn
#48 opened a year ago by wac81
3
A few questions on training
#21 opened a year ago by TheRealAakash
3
speed up with flash attn in A6000?
#47 opened a year ago by wac81
2
i use other params with palm, but got error
#45 opened a year ago by wac81
4
norm.gamma not used during backprop
#46 opened a year ago by conceptofmind
2
Can we just replace PPO+RLHF with a preference models thats basically a transformer encoder + sigmoid model, trained with BCE. And during finetuning perform a reward maximization by just making the reward model predict 1s?
#12 opened a year ago by ssintelli
5
Calculating the kl loss seems has a mistake.
#43 opened a year ago by Nightbringers
1
Reason for using pooled critic embedding instead of the last embedding for value head
#42 opened a year ago by gblackout
3
KL divergence loss
#38 opened a year ago by taynoel84
1
train your reward model issue
#37 opened a year ago by wac81
1
Can not train the model using PyTorch version 2?
#36 opened a year ago by linhduongtuan
1
Is it possible to release a code based on jax?
#16 opened a year ago by sglucas
7
mask raised error
#39 opened a year ago by gongel
2
Value function
#35 opened a year ago by tonylin52
0
Is it possible to train this ai using open-assistant or vice versa?
#33 opened a year ago by qwertystars
1
Do you need cuda for this?
#30 opened a year ago by beew
1
Can we exploiting AGI ability of chatGPT ?
#32 opened a year ago by youkpan
0
Is this shift right for the action logits?
#31 opened a year ago by kisseternity
4
Are there some pictures that describe PaLM architecture?
#29 opened a year ago by guotong1988
1
value function input
#28 opened a year ago by kkissmart
1
The loss function of reward model.
#22 opened a year ago by huzechuan
2
KL_div/ratio on policy
#26 opened a year ago by kkissmart
0
Encoder-Decoder
#6 opened a year ago by Bachstelze
39
How to fine-tune and train on my own data?
#20 opened a year ago by rbhatia46
0
Training the reward model
#19 opened a year ago by farhad-abdi
8
PaLM-rlhf-pytorch Roadmap
#18 opened a year ago by HappyPony
4
Help with computational power
#17 opened a year ago by byteunix
4
Noob question: How can I use this model for inference?
#8 opened a year ago by PrasoonPratham
1
Simple Web Interface
#15 opened a year ago by conceptofmind
2
Why the value calculate in generate and learn use different mask？
#14 opened a year ago by Nightbringers
1
Palm
#13 opened a year ago by Phob3tor
0
I'm dumb
#11 opened a year ago by cardonasMind
1
Can I train a model on my own data?
#9 opened a year ago by sveisa
1
Unified reward function/model architecture for a wide range of tasks
#4 opened a year ago by James4Ever0
2
GPU requirements
#5 opened a year ago by ejarkm
3