Support REPA

Question

Support REPA

Opened this issue 2 months ago · 2 comments

REPresentation Alignment (REPA), a simple regularization technique built on recent diffusion transformer architectures.

reference code:
https://github.com/sihyun-yu/REPA/blob/0f6025751aae6746636113430077169abee26ceb/train.py#L265-L300
https://github.com/sihyun-yu/REPA/blob/0f6025751aae6746636113430077169abee26ceb/loss.py#L77-L88

Answer 1 · 2024-10-14T11:45:36.000Z

Thank you! REPA is interesting.

However, I have one concern. That is, will it be effective if we introduce it from fine tuning? If we introduce it from fine tuning, the model need to learn both the fine tuning content and the features for the first few blocks. I am worried that this will actually result in a decrease in performance.

But I think we need to test it.

Answer 2 · 2024-10-15T10:22:01.000Z

Thank you! REPA is interesting.

However, I have one concern. That is, will it be effective if we introduce it from fine tuning? If we introduce it from fine tuning, the model need to learn both the fine tuning content and the features for the first few blocks. I am worried that this will actually result in a decrease in performance.

But I think we need to test it.

Here's what I'm thinking: we can take out the first few blocks and use them exclusively for REPA's regularization fine tuning alignment, and then use the rest of the blocks for fine tuning .That way we don't have the problem of simultaneous training.