Proposal: Scheduler-Generic Lesson Templates
Opened this issue · 5 comments
The snippet library in our lessons is useful, but does not map well into the new Carpentries Workbench framework, and involves a fairly high degree of maintenance.
This Proposal is to replace our cluster-/site-specific snippet libraries with generic content for schedulers instead, e.g. Slurm, PBS/Torque, and SGE, starting with HPC Workflows as a demonstration.
Advantages:
- Less overhead when editing a lesson or adding new material
- Easier conversion to the Workbench
- Simplified lesson files
- Easier to merge changes from downstream due to better alignment of scope (2 or 3 variants)
Disadvantages:
- Removes customization and self-expression capabilities
- Some snippets contain really useful material for specific/niche concerns
- Harder internationalization (i18n and l10n): translations are not yet supported by the Workbench (per community meeting, 17 April 2023)
As mentioned in Slack, there are a couple of options on how to practically implement this: https://reside-ic.github.io/blog/r-markdown-internationalisation/ has a pretty good overview.
The old approach could probably still be used if we employ child documents, but it would involve quite a bit of boiler plate (and if I am honest, I definitely prefer having the content directly in the lesson files!). There was some discussion on this in carpentries/sandpaper#368
From the discussion at the May 4 co-working meeting, an example of valuable content that's only in the snippets at the moment is the Compute Canada snippets, which expand the lesson substantially with material about how to obtain software for their site.
Information like this presents a challenge in the absence of a mechanism for site-specific annotations or content, but on the other hand, site-specific content does not get captured in the core lesson, and maybe should be.
Edit: A likely solution to this is that, for the case of software environments, probably this isn't truly site-specific, nobody "owns" lmod modules, conda, or wget/curl. The content can probably be expressed in a non-site-specific way?
Todo: summarize specific or detailed lesson content in the Snippet Libraries that is not captured elsewhere.
For new adoption, editing the snippet library is tedious, because the snips contain no context and many of them appear to be boilerplate output. Without rhyme or reason, this task becomes daunting. Streamlining this would be excellent.
I came across this issue as I discovered the snippets feature of HPC Carpentry recently, and want to add a couple of notes:
- The desire for site-specific or institution-specific extra content is not limited just to HPC Carpentry - it's common across the Carpentries for people to fork lesson repos and add extra lines to the material here and there. If you have an idea of how to support this type of customisation slightly better via the Workbench, please do bring this to the rest of the community! I may host a community discussion on this topic in the next few months.
- tagging @Fehings who forked hpc-intro for CarpentriesOffline to let them know the snippets model may change in future
I've been thinking about how to implement translations recently and at the same time consider how we can support multiple schedulers. I've come up with something for translations (see carpentries/sandpaper#18 (comment) and carpentries/sandpaper#18 (comment) for the details), and I have an idea for how this can also work for schedulers:
- We use child documents when we need to incorporate something that is scheduler specific.
- The default child documents are stored alongside the markdown (as symlinks to contents under a
slurm
subdirectory). The scheduler specific alternates are stored with the same name under a scheduler folder in the same directory (such directories are currently ignored by the Workbench build process). - This could perhaps be controlled by a configuration variable instead (and indeed we could maintain different lesson configurations for different schedulers), but it is currently unclear to me if child documents in subfolders are properly considered by the Workbench build process.
- The default child documents are stored alongside the markdown (as symlinks to contents under a
- We carry out N build processes, one for each language/scheduler combination and store the results in a subfolder alongside the default build of the lesson (english/slurm)
- Before each build, we update the symlinks for the target scheduler
- If needed, we also replace the language
- After reorganising our N builds into the desired final structure, we inject a div in the header of all pages that allows people to select the language/scheduler they require
This approach has the benefit of completely separating translations and scheduler content but does mean that the complexities of the build process are hidden away within our GitHub Actions. This approach could also be extended for different configurations of the lesson for different sites (but then the list of possible options may be too much).
I believe this approach can be implemented with the Workbench as it stands today.