Support REPA
Opened this issue · 2 comments
https://github.com/sihyun-yu/REPA
REPresentation Alignment (REPA), a simple regularization technique built on recent diffusion transformer architectures.
reference code:
https://github.com/sihyun-yu/REPA/blob/0f6025751aae6746636113430077169abee26ceb/train.py#L265-L300
https://github.com/sihyun-yu/REPA/blob/0f6025751aae6746636113430077169abee26ceb/loss.py#L77-L88
Thank you! REPA is interesting.
However, I have one concern. That is, will it be effective if we introduce it from fine tuning? If we introduce it from fine tuning, the model need to learn both the fine tuning content and the features for the first few blocks. I am worried that this will actually result in a decrease in performance.
But I think we need to test it.
Thank you! REPA is interesting.
However, I have one concern. That is, will it be effective if we introduce it from fine tuning? If we introduce it from fine tuning, the model need to learn both the fine tuning content and the features for the first few blocks. I am worried that this will actually result in a decrease in performance.
But I think we need to test it.
Here's what I'm thinking: we can take out the first few blocks and use them exclusively for REPA's regularization fine tuning alignment, and then use the rest of the blocks for fine tuning .That way we don't have the problem of simultaneous training.