a mistake in rotary embbeding

Line 236 in 47e3180

q, k = map(lambda t: apply_rotary_pos_emb(t, rotary_emb), (q, k))

I'm testing with a toy dataset.

The previous code was not trained.

Changing to this code seems to be training well from the first epoch.

Awesome! Thank you so much.