How can we categorize CodaLab
zhimin-z opened this issue · 6 comments
https://github.com/codalab/codalab-worksheets/, the famous experiment tracking tool powering HuggingFace, SQUAD, etc.
relate to #293 @axsaucedo
Good question, there are platforms that tend to have a lot of functionalities, but often trying to take (or choose) the most prominent and add there, i would say for this one in the model and data versioning
Closing as aligned on discussion on section
Good question, there are platforms that tend to have a lot of functionalities, but often trying to take (or choose) the most prominent and add there, i would say for this one in the model and data versioning
Hi @axsaucedo But the key issue is that Colab did nothing about model&data versioning if you took a closer look. This is made exclusively for experiment reproducibility. I think we should reconsider my previous proposal related to #293
Experiment reproducibility = model & data versioning, which is why it's suggested to include in that section
Experiment reproducibility = model & data versioning, which is why it's suggested to include in that section
Honestly, I do not think this always holds for contemporary ML projects: Experiment reproducibility = model & data versioning.
Roughly, we need to keep track of code, data, and model, but this coarse granularity loses track of most of the non-trivial things in managing ML experiments since this view only focuses on artifacts from a static perspective. For example, when we conduct an ML experiment, we need to maintain optimizers (optimizer versioning), dependencies (dependency versioning) and keep track of the ETL pipeline (pipeline versioning) as well. Also, random seeds (seed versioning), documentation (observation versioning), and licenses (IP versioning) are miscellaneous, which might be hard to simply categorize as data tracking due to their finer granularity. Containerization indeed helps a lot which frees the MLOps engineers from versioning the hardware/OS-related things, but the model orchestration (schedule versioning) is another layer of complexity when conducting A/B deployment in the real world. When the drift (concept, data, etc.) happens, we also need to monitor the metrics/logs (log/metrics versioning) in case we need to retrain our model.
I would suggest extending the scope a bit and renaming "model & data versioning" to "experiment tracking", which makes more sense since most of the tools here (According to this paper in #293) are used for the latter purposes. I can also share my paper preprint on the terminology of ML experiment management in person later. Does that sound good so? @axsaucedo