We have decided to migrate all ARGO workflow source code repositories to a dedicated new GitHub organization (https://github.com/icgc-argo-workflows) to make it easier to manage ARGO workflow source code, coordinate development efforts and share the workflows with broader cancer genomics and bioinformatics communities. The timing is also right as we have largely finished processing ICGC 25K data using ARGO workflows and a large number of developers from ARGO participating institutes are joining forces to develop more ARGO workflows.
To make the process more manageable, we will do the migration is multiple phases:
- first, a pilot phase includes repos under
icgc-argo
organization that are related to ARGO DNA alignment workflow - second, the remaining workflow repos under
icgc-argo
organization - third, workflow repos reside in GitHub organizations managed by different working groups
After workflow source code repository migration, we'd like to make it no impact on running the workflows. This is primarily to fulfill our commitment to workflow reproducibility. All versions of the ARGO workflows were used to process production data must be reproducible, even after the source code repo migration.
- There is also a fallback plan if things happened not as expected. The fallback plan is to transfer the original repo back to the original organization.
- The migration process should be brief, typically should be under 10 minutes per repo.
- After migration, for any released version of the workflow, it can be run properly using either the original git url or the new one. And in either case, the workflow code and docker images used should be exactly the same.
- Original repositories must be archived and maintained (not to be deleted) from the original organization.
- Original docker images must be maintained at the original github organization (either ghcr.io or quay.io depending on the origina of the docker image).
- WFPM packages released under the original repositories will continue to be available to be imported, just like how it works before the migration.
Prerequisite: the user executing the SOP needs to be owner of both source and target GitHub organizations in order to initial repository transfer.
For each repository to be migrated, please follow these steps:
- Make sure to complete and release all currently in-development packages
- Transfer the ownership of the repository from the original organization to the new organization (https://github.com/icgc-argo-workflows)
- Fork the same repo from the new organization back to the original organization
- Fork might take some time although it looks completed quickly, pause for ~30 minutes before continuing
- Since forking at step 2 does not create releases, run the
release-duplicator.py
script to create all releases for the forked repo at the original organization so that the releases are the same as what in the repo under the new organization. If you get error creating a release, you will need to delete the fork and start over from step 2 and pause longer at step 3. - Run
git add .
andgit commit
to backup releases and associated assets. - In the forked repo under the original organization, add archive note in README.md and commit. Archive note may look like:
NOTE: this repository is archived to support workflow reproducibility. Active development continues at: <url to the new repo>
- Archive the forked repo under the original organization, the archived repo must be maintained for as long as needed to maintain reproducibility of workflow versions ran in ARGO production.
- Continue to maintain all existing docker images registered under either the original GitHub organization or under quay.io.
For each repository after migrated to under the new GitHub organization, please follow these steps to complete a one-time update before normal WFPM package development process:
- Make a fresh clone of the repository
- Create a new tracking branch based off main, then edit configuration file, package metadata file, source code
scripts of all packages from old GitHub organization to the new organization. Here
is an example commit update the organization for two packages under the repository
icgc-argo-workflows/demo-pkgs1
: https://github.com/icgc-argo-workflows/demo-pkgs1/commit/5d012d691a154f9b55281e178ea4cc29e5ee1b87. Example files to be updated:
.wfpm
: WFPM project config filenextflow.config
: Nextflow config file<package-a>/Dockerfile
: Package A Dockerfile (only for tool package)<package-a>/pkg.json
: Package A metadata JSON file<package-a>/main.nf
: Package A entry point script (only for tool package)<package-a>/tests/checker.nf
: Packge A test launcher script (only for tool package)<package-b>/Dockerfile
: Package B Dockerfile (only for tool package)<package-b>/pkg.json
: Package B metadata JSON file<package-b>/main.nf
: Package B entry point script (only for tool package)<package-b>/tests/checker.nf
: Packge B test launcher script (only for tool package)
- Once step 1 is done, create a PR against main. PR review and marge as usual
- For each package, create a new version under the new organization:
- assume the package name is
pkg-a
and the latest version is1.2.3
, start a new version usingwfpm nextver pkg-a@1.2.3 1.2.3.1
. This will create a new branchpkg-a@1.2.3.1
and make it the current branch. - then merge the update from main branch
git merge main
, address merge conflict inpkg.json
file as needed (take incoming changes). - continue as usual with: git push, create PR and merge PR. Do NOT release the package when merge the PR (note that after merged to main, GitHub actions tests are expected to fail, which is why we can't release yet)
- repeat step 3 until all WFPM packages in the repository are covered
- switch to the main branch and run
git pull
to sync with the remote - run WFPM release command to release each of the new package version, eg,
wfpm release pkg-a@1.2.3.1
to release version1.2.3.1
ofpkg-a
- repeat step 6 until all new versions of all packages are released
Original repo | Original org | Migration status | New repo | New org |
---|---|---|---|---|
dna-seq-processing-wfs | icgc-argo | completed | dna-seq-processing-wfs | icgc-argo-workflows |
nextflow-data-processing-utility-tools | icgc-argo | completed | nextflow-data-processing-utility-tools | icgc-argo-workflows |
data-processing-utility-tools | icgc-argo | completed | data-processing-utility-tools | icgc-argo-workflows |
dna-seq-processing-tools | icgc-argo | completed | dna-seq-processing-tools | icgc-argo-workflows |
data-qc-tools-and-wfs | icgc-argo | completed | data-qc-tools-and-wfs | icgc-argo-workflows |
gatk-tools | icgc-argo | completed | gatk-tools | icgc-argo-workflows |
sanger-wgs-variant-calling | icgc-argo | completed | sanger-wgs-variant-calling | icgc-argo-workflows |
sanger-wxs-variant-calling | icgc-argo | completed | sanger-wxs-variant-calling | icgc-argo-workflows |
gatk-mutect2-variant-calling | icgc-argo | completed | gatk-mutect2-variant-calling | icgc-argo-workflows |
variant-calling-tools | icgc-argo | completed | variant-calling-tools | icgc-argo-workflows |
open-access-variant-filtering | icgc-argo | completed | open-access-variant-filtering | icgc-argo-workflows |
icgc-argo-sv-copy-number | ICGC-ARGO-Structural-Variation-CN-WG | completed | icgc-argo-sv-copy-number | icgc-argo-workflows |
argo-qc-tools | icgc-argo-qc-wg | completed | argo-qc-tools | icgc-argo-workflows |
rna-seq-alignment | icgc-argo-rna-wg | completed | rna-seq-alignment | icgc-argo-workflows |