feizc/FluxMusic

about terms of apply_rope (freqs_cis, pe ...)

KinamSalad opened this issue · 0 comments

Hello. Thank you for your wonderful code :)
I have a question about the freqs_cis term in the apply_rope function in modules/layers.py.

This function is used for attention, and if we look at model.py, we can see that the embeddings of txt_id and img_id are used as the freqs_cis term.

What are txt_id and img_id? Do we need any other terms besides the text and music pairs?

I commented out the apply_rope function and trained my model with just text/music pairs, but I didn't get good results.

It would be great if you could tell me what format this data is in.

Thank you