galaxyproject/idc

Workflows instead of custom yaml language ?

Opened this issue · 0 comments

I've been working on allowing reference data parameter to consume tool data bundles directly, without creating loc file entries (https://github.com/galaxyproject/galaxy/compare/dev...mvdbeek:galaxy:bundle_as_input?expand=1).

Since the bundles are then just regular datasets, we could create regular Galaxy workflows that deal with interdependencies between data manager steps. For the genomic workflow (fasta -> derived reference data) this is now hardcoded in ephemeris, but we'd have to make this more general for other types of data. My proposition is that we use workflows to express this instead.

What we then collect in the idc are workflows for indexing stuff, and job files that describe workflow parameters. That means we don't need to have custom logic for splitting jobs, ordering stages, etc. I think data manager runs are very easy to cache, the workflows can be tested outside of a complicated jenkins setup, we can finally say "gxformat2 is used in production". There's obviously a lot of details to work out, but this seems more appealing to me both in how easy it is for users to contribute new data, how portable and testable it is, I think it converges nicely with putting workflows front and center (which is my personal goal), it's a cool avenue for how toy reference data can be provided for IWC workflows, we can get more stuff into workflowhub and dockstore.