about terms of apply_rope (freqs_cis, pe ...)
KinamSalad opened this issue · 0 comments
KinamSalad commented
Hello. Thank you for your wonderful code :)
I have a question about the freqs_cis term in the apply_rope function in modules/layers.py.
This function is used for attention, and if we look at model.py, we can see that the embeddings of txt_id and img_id are used as the freqs_cis term.
What are txt_id and img_id? Do we need any other terms besides the text and music pairs?
I commented out the apply_rope function and trained my model with just text/music pairs, but I didn't get good results.
It would be great if you could tell me what format this data is in.
Thank you