Remove top-level copied field `files`
Closed this issue · 9 comments
During the index/reindex runtime, the Dataset ingest_metadata.files
field gets copied to a top-level filed files
. Recently we've come across some datasets that contain a large number of ingest_metadata.files
(the Dataset field ingest_metadata
gets renamed to metadata
during index runtime) entries (fbf3af732f53b00f20a9ecc1ecc3c52b
for instance, the payload size 2MB).
Such duplicates have caused:
- bigger response json payload (> 10MB)
- longer search query execution and reindex time
We should remove the original one and only keep the copied version.
@lchoy @john-conroy @NickAkhmetov @bherr2 will this change affect any of your UI handlings?
Having the files at the top level of the doc would break our UI and require some work in the portal-ui
.
We read from metadata.files. Does this affect that?
PS. Here are the fields we query for / use: https://github.com/hubmapconsortium/ccf-ui/blob/main/projects/ccf-database/src/lib/xconsortia/xconsortia-data-import.ts#L17-L38
@john-conroy @bherr2 does this mean the portal-ui
and ccf-ui
are not consuming the top-level files
(copied from metadata.files
) at all?
On ccf-ui
side, that's correct.
@bherr2 @john-conroy if you are sure you don't use the top-level files
field, we'll plan to remove it, is that fine with you?
There will be additional upcoming changes to the Dataset metadata.files
and metadata.metadata
in the near future. We'll discuss and come up with a plan.
Fine by me
I'll have to look through our repos before I can fully confirm.
Closing this issue, will handle this separately.