pyiron/pyiron_atomistics

How to make SPHInX input writer independent of pyiron while using `DataContainer`

samwaseda opened this issue · 6 comments

I'm going through hell in this PR.

In short, the problem is that SPHInX input is highly nested and currently strongly dependent on DataContainer. More precisely, it sort of needs the ability to have multiple appearance of a single key (which dict doesn't allow) and recursive HDF saving. I'm now wondering whether I should rewrite Group in SPHInX to make it derive from dict, or if there's a way to still use DataContainer. Maybe I'm also totally missing something super simple, so I'd be thankful for any comments.

It is of interest to me/tentatively my plan to extract DataContainer as a stand-alone tool. If that would also solve your problem here, it would make me more likely to pursue that and we could work together to get it done quicker.

Ah yeah that would definitely solve the problem. Is it more difficult than just copy and paste?

I am a bit confused about the recursive HDF part. I thought the idea of separating the input writer into a separate module was for this newly separated module to be independent of the data storage. While currently HDF is our preferred storage format I do not think we want to require the standalone modules to already have a HDF interface.

I am a bit confused about the recursive HDF part. I thought the idea of separating the input writer into a separate module was for this newly separated module to be independent of the data storage. While currently HDF is our preferred storage format I do not think we want to require the standalone modules to already have a HDF interface.

Yeah after I wrote it I also realized that it wasn't really part of what should be done, so you are right about it. But the general point remains: Whether to rewrite parts of DataContainer in the parser or to import them somehow.

In tinybase @pmrv introduced an abstraction layer for storage. He can afterwards use different storage backends to 'store' the information in hdf or in a plain dict if I recall correctly. As such, the DataContainer could be separated in such an abstract version with some serialize method that is able to take some storage interface and populate it.

Ah yeah that would definitely solve the problem. Is it more difficult than just copy and paste?

Only marginally; we want to make sure we preserve the git history. This isn't hard though.

More complex is just (a) thinking about what exactly we want in the new package, e.g. Niklas' comments, and (b) actually removing the extracted stuff from their original locations and replacing them with the new extracted version. I don't think these things are super complicated either, but they're reaching beyond copy and paste.