lucidrains/transfusion-pytorch

Default times are multiplied by batch size

Closed this issue · 4 comments

  1. In the toy training example, the train timestep is permanently 0.5 and does not vary. Suggest making clear that generative tasks will require a custom time function to be passed.

  2. When batch size > 1, times per token are being multiplied by the batch size. e.g. in toy example this is showing as padded times = 2 because batch size is 4.

@RefractAI for number one, I don't believe that is true if there is only one modality. if that is the case, it is a bug and I'll fix it tomorrow

for point two, can you link me to the line of code? unsure what you are referring to

times_per_token = einsum(is_modalities.float(), times, 'b t m n, b m -> b t n')

tensor([[[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]]], device='cuda:0')

times_per_token = einsum(is_modalities.float(), times, 'b t m n, b m -> b t n')

tensor([[[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]]], device='cuda:0')

ah thank you, will take a look tonight once I get back from the park with doggo

@RefractAI this should be resolved, thank you again for raising this