Default times are multiplied by batch size

Question

Default times are multiplied by batch size

Closed this issue 2 months ago · 4 comments

In the toy training example, the train timestep is permanently 0.5 and does not vary. Suggest making clear that generative tasks will require a custom time function to be passed.
When batch size > 1, times per token are being multiplied by the batch size. e.g. in toy example this is showing as padded times = 2 because batch size is 4.

Answer 1 · 2024-11-10T23:38:10.000Z

@RefractAI for number one, I don't believe that is true if there is only one modality. if that is the case, it is a bug and I'll fix it tomorrow

for point two, can you link me to the line of code? unsure what you are referring to

Answer 2 · 2024-11-10T23:56:37.000Z

times_per_token = einsum(is_modalities.float(), times, 'b t m n, b m -> b t n')

tensor([[[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]]], device='cuda:0')

Answer 3 · 2024-11-11T00:11:18.000Z

times_per_token = einsum(is_modalities.float(), times, 'b t m n, b m -> b t n')

tensor([[[0., 0., 0., 0., 0., 2., 2., 0.]],
    [[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]],

    [[0., 0., 0., 0., 0., 2., 2., 0.]]], device='cuda:0')

ah thank you, will take a look tonight once I get back from the park with doggo

Answer 4 · 2024-11-11T01:07:45.000Z

@RefractAI this should be resolved, thank you again for raising this