Comparison with DVC

Question

Comparison with DVC

erdnaavlis opened this issue 6 years ago · 2 comments

Hello!

First of all thank you for your contribution to the community! I’ve just found out about this and it seems to be a nice project that is growing!

You are probably familiar with dvc (https://github.com/iterative/dvc).

I’ve been investigating it in order to include it in my ML pipeline. Can you explain briefly how/if Lazydata differs from dvc? And any advantages and disadvantages? I understand that there may be some functionalities that maybe are not yet implemented purely due to time constraints or similar. I’m more interested in knowing if there are any differences in terms of paradigm.

Ps- if you have a different channel for these kind of questions please let me know.

Thank you very much!

Answer 1 · 2018-09-03T20:03:04.000Z

Hi Andre!

Sure, so at the moment the main differences are in a couple of design decisions:

In DVC there is an extra .dvc file for every data file you have. In lazydata all file metadata is stored in a single file lazydata.yml.
DVC uses the same basic paradigm as git-lfs - data files are tightly coupled to the repository, and after doing git pull you would normally do dvc pull that will pull all the data files. In lazydata, you normally download files "lazily", i.e. calling track() on a file that is tracked but missing from the local copy will download it. The idea is to enable you to simply track all of your data files without worrying about someone else downloading it, because they'll only ever download it if they need it.
The main interface of lazydata is programatic (ie from within Python) and I'll continue developing it in that direction. The main interface of dvc and git-lfs is command-line.

Of course, DVC and git-lfs have more features at the moment, but I expect the differences in these design decisions will remain the same. Hope this answers the question!

Answer 2 · 2018-09-04T14:11:27.000Z

Thank you very much for your reply @rstojnic. It makes sense, and your point 2., is very appealing.

I’ll make sure to give Lazydata a try soon :)