mlcommons/croissant

guidance on large Croissant files, especially in `<head>`

pdurbin opened this issue · 0 comments

Could the spec offer guidance on large Croissant files, especially when they are added to the <head> of a dataset landing page, greatly increasing its size?

This is not a new problem for us (Dataverse). To support Google Dataset Seach, we already include Schema.org content, which can be quite large, in the <head> of pages. A dataset with 25,310 files has a Schema.org file that is 4.4 MB, mostly due to the long file listing under "distribution".

Croissant exacerbates the problem. The same dataset yields a Croissant file that is 7.1 MB. This a lot of extra weight for a dataset landing page.

Can you please suggest some best practices? What is a reasonable upper limit for a Croissant file that will go in the <head> of a page? When we reach the limit, what should we do? Only show a few files under "distribution"?

Again, I'm mostly talking about the content that goes into the <head> of a page. A 7.1 MB Croissant file is fine when it is downloaded separately from the dataset landing page, via API.

Thanks!