Questions about LAML procedure implementation

Question

Questions about LAML procedure implementation

AlexSteveChungAlvarez opened this issue a year ago · 4 comments

AlexSteveChungAlvarez commented a year ago

Hello @Flux9665 ! I hope you are doing well.
I have been reading your code in order to understand the LAML procedure implementation for training you did.
I have a question on these lines of code:

    while len(batches) < batch_size:
          for index in random.sample(list(range(len(datasets))), len(datasets)):
              if len(batches) < batch_size:
                  # we get one batch for each task (i.e. language in this case) in a randomized order
                  try:
                      batch = next(train_iters[index])
                      batches.append(batch)
                  except StopIteration:
                      train_iters[index] = iter(train_loaders[index])
                      batch = next(train_iters[index])
                      batches.append(batch)

Here you don't ensure that all languages have the same amount of samples in a batch, for example if the batch size was 32 and you got 3 languages, the "batches" array would have 11 samples of each of the first 2 languages sampled randomly and 10 of the third.
Is this OK according to this procedure? I thought the goal was to have the same amount of samples of each language per batch.
My other question is: What is the goal of a random order of the languages in the batch? I think it would achieve the same result staying as the array "datasets" is since languages would be distributed equally. By now this wouldn't happen unless you choose a batch_size multiple of the number of languages.
I will be waiting for your answers to this theoretical questions!

Answer 1 · 2023-08-30T15:21:36.000Z

Hi @AlexSteveChungAlvarez.

As far as I understand, the random sampling from datasets is used exactly for the case where batch_size % number_of_languages != 0 to minimize the imbalance of samples from each dataset across iterations.

Given your example with batch size 32 and 3 languages, for each batch you will get 10 samples of all 3 languages, then 2 of the 3 languages will randomly be selected to complete the batch.

While this still means that any single batch doesn't contain an equal amount of samples from all languages, it should even out across all batches.

Without using random, you would indeed always get 11 samples of the first 2 datasets in your training pipeline, and only 10 samples from the third dataset.

Answer 2 · 2023-08-31T21:35:37.000Z

Given your example with batch size 32 and 3 languages, for each batch you will get 10 samples of all 3 languages, then 2 of the 3 languages will randomly be selected to complete the batch.

Then should the number of steps be multiple of the number of languages? So it can be possible that the number of samples even out across all batches?

While this still means that any single batch doesn't contain an equal amount of samples from all languages, it should even out across all batches.

Following the same example, for this to happen, there should be 3 rounds/steps so it can be posible to get 32 samples for each of the 3 languages. Then the total steps should be multiple of 3 so it's possible to achieve the same number of samples for each language, shouldn't it?

Answer 3 · 2023-09-02T08:35:38.000Z

In theory, yes to both points, however you can't guarantee that random.sample() will give you a perfectly equal distribution after a specified number of steps. You would need a different algorithm for that.

In practice, over thousands or even millions of steps, I doubt it makes a difference.

Answer 4 · 2023-10-04T13:58:26.000Z

Great explanation by Lyonel, just one thing I want to add for completeness: Originally the LAML procedure was doing actual model agnostic meta learning. But over time we found that we can simplify the procedure until now at this point it is just multi task learning and no longer really close to the original MAML procedure. So there are some discrepancies between what the paper describes and what is in the code now. The version in the code is simpler, faster and works just as well.