Default times are multiplied by batch size
Closed this issue · 4 comments
-
In the toy training example, the train timestep is permanently 0.5 and does not vary. Suggest making clear that generative tasks will require a custom time function to be passed.
-
When batch size > 1, times per token are being multiplied by the batch size. e.g. in toy example this is showing as padded times = 2 because batch size is 4.
@RefractAI for number one, I don't believe that is true if there is only one modality. if that is the case, it is a bug and I'll fix it tomorrow
for point two, can you link me to the line of code? unsure what you are referring to
times_per_token = einsum(is_modalities.float(), times, 'b t m n, b m -> b t n')
tensor([[[0., 0., 0., 0., 0., 2., 2., 0.]],
[[0., 0., 0., 0., 0., 2., 2., 0.]],
[[0., 0., 0., 0., 0., 2., 2., 0.]],
[[0., 0., 0., 0., 0., 2., 2., 0.]]], device='cuda:0')
times_per_token = einsum(is_modalities.float(), times, 'b t m n, b m -> b t n')
tensor([[[0., 0., 0., 0., 0., 2., 2., 0.]],
[[0., 0., 0., 0., 0., 2., 2., 0.]], [[0., 0., 0., 0., 0., 2., 2., 0.]], [[0., 0., 0., 0., 0., 2., 2., 0.]]], device='cuda:0')
ah thank you, will take a look tonight once I get back from the park with doggo
@RefractAI this should be resolved, thank you again for raising this