kvablack/ddpo-pytorch
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
PythonMIT
Issues
- 0
In train.py, the sample order between sample["advantages"] and sample["timesteps", "latents", "next_latents", "log_probs"] does not match.
#29 opened by YangSun22 - 2
- 2
Questions about the reward curve and bert.
#23 opened by zjuAIHz - 0
Does this training process apply to latest SD, such as stabilityai/stable-diffusion-3-medium
#27 opened by roywang021 - 0
How to save fine-tuned model properly
#26 opened by shashankg7 - 0
training-code
#25 opened by ParnyanAtaei - 1
- 0
Finetuning on google colab
#22 opened by alirezanobakht13 - 1
Batch size unrecogonized
#20 opened by mao-code - 1
SDXL Support?
#17 opened by rdcoder33 - 0
Code logics, thanks
#19 opened by junyongyou - 2
unet keeps producing nan during training
#18 opened by EYcab - 8
Hello, when I trained an aesthetic model using the default configuration on 8 A800 cards, I found that the training process got stuck after completing one epoch, but it worked fine when using a single A800 card. May I ask what could be the cause of this situation?
#13 opened by cjt222 - 5
About the training with prompt_image_alignment configuration which uses llava_bertscore reward function
#11 opened by QZJ-2003 - 0
prompt-dependent value function optimization
#15 opened by hkunzhe - 3
- 7
reproducing the aesthetic experiment
#3 opened by seashell123 - 1
Support for other schedulers
#14 opened by desaixie - 1
Question about the optimized objective.
#12 opened by JacobYuan7 - 3
- 4
Prompt Alignment with LLaVA-server: Client-side prompt and image doesn't match server side reward
#6 opened by desaixie - 2
fp16 only if using lora?
#8 opened by GiilDe - 2
On reproducing LLaVA alignment experiments.
#5 opened by bhattg - 2
- 1
Gif visualization
#1 opened by SnowdenLee - 1
On reproducibility and LoRA
#2 opened by bhattg