IntelLabs/matsciml

[Feature request]: Standardized data structure for datasets

laserkelvin opened this issue · 0 comments

Feature/behavior summary

A consistent, standardized data structure would make new datasets significantly easier to implement and maintain, as well as easier for model and task development by setting reasonable expectations of attribute names, etc.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?
  • Does this proposal include a new model?
  • Does this proposal include a new dataset?
  • Does this proposal include a new task/workflow?

Related issues

#89 was where some of these discussions were had, and originated from #85

Solution description

There are two possible ways of implementing this: a flat DataSample structure which may comprise a graph or point cloud, leaving it a little ambiguous; a base AbstractDataSample class, and have PointCloudSample and GraphSample structures.

Not 100% sure how batching will look yet, but perhaps a Batch structure should also be introduced.

Additional notes

No response