test_tinyllama issue with LitData and `iterate_over_all`

Question

test_tinyllama issue with LitData and `iterate_over_all`

Andrei-Aksionov opened this issue 4 months ago · 2 comments

Hi there 👋

Apparently there is an issue with tinyllama test and the newest version of LitData (0.2.6).
In the release notes one can see that iterate_over_all has just been added:

Add support for iterate_over_all for the CombinedDataset by @tchaton in Lightning-AI/litdata#122

and that's why the issue didn't appear before.

Don't know whether this issue is on LitGPT or LitData side.
Maybe @awaelchli has any thoughts?

Answer 1 · 2024-05-08T20:56:02.000Z

LitData made the decision to enforce iterate_over_all by default as a breaking change. LitGPT will have to set iterate_over_all=False explicitly now and require litdata>=0.2.6. The error message needs to be fixed though.

Answer 2 · 2024-05-09T06:40:47.000Z

Yes, the default behaviour was confusing to some users. It felt more natural all the samples should be seen, especially when used for computing the validation metrics.

As @awaelchli shared, let's add iterate_over_all to LitGPT where needed.