Tauffer-Consulting/domino

Data management

Closed this issue · 2 comments

Since machine learning training requires the use of large amounts of data, our current approach seems a little cumbersome. Have you considered adding a new data management module? Users can upload and download, as well as save training results.thanks for your answer!

Hey @lanzhixi

What you mean by data management in this case? Are you talking about piece results and how we share data between pieces?
Indeed the way we share data between pieces is not so easy to understand, but the approach used is to make possible to transmit data as it was all in the same file system even when running in Kuberenetes. You can check how it works here.
But yes, we think about data management and new features, I think we can start dividing it into download and upload as you said.

  1. Upload: We thought about having a "Dataset" section where user can upload some data that would be available to be used in his workflows. Not sure yet the best way for doing that but for sure totally feasible. What you think here? Do you have any suggestions for an "upload" feature?
  2. Download: Currently we support downloading pieces results for each individual piece or a complete report for the entire workflow. I'm not sure, but I think we do not have an option for downloading the raw results files, what would be good... We can add this for sure. However, while this feature is not implemented, if you are running it locally using docker-compose you can access all the results for each workflow run in the domino_data folder in your domino project folder.

Anyway, I'm happy to discuss these features with you. I think this kind of feedback is important to help us drive the development toward what will be more useful for those using Domino.

Thanks!

I'm trying to combine domino and minio together, discussed in detail on Friday.Thanks!