Batch size >1 not working, and loss queries
Closed this issue · 1 comments
Thanks for creating this.
- Batch size >1 only works if the following changes are made:
- Rotary embedding modified by .unsqueeze(1)
- order_modality_positions_by_seq_offset is commented out
-
The loss is on the flow after it's already been projected to the transformer hidden dim from the latent dim. Shouldn't the flow be calculated from data before projection to allow that projection itself to be optimized? Or would there need to be an extra reconstruction loss to ensure the projection/unprojection is accurate.
-
Not currently an issue, but the Transfusion paper states either linear or Unet was used to patchify the image data, but doesn't specify whether skip connections or attention were used in that Unet.
-
modality_positions_to_tensor doesn't move the new tensor to the device with the modalites
thanks for testing it out this soon Carolina (if that is your name)
repo is far from being in a reviewable state, but i addressed 1 and 4 just now
2 and 3 i'm aware of and will be addressed by next week's end