Checking new tree

  1. Download generated files into nextclade data workflow repo:

    scp -rC roemer0001@login-transfer.scicore.unibas.ch:~/nextclade_data_workflows/sars-cov-2/output output
  2. Plug them into nextclade.org advanced view.

  3. Filter to new nodes and check that:

    • clades are clean
    • no big outliers
  4. Check tag.json is up to date (ideally update in profiles/tag.json for posterity)

  5. Check qc.json does not regress (ideally update in profiles/qc.json for posterity) [beware, codons are 0 indexed]

  6. Potentially run scripts/common_stops.py and scripts/common_frameshifts.py to add new stops/frameshifts that have become more common to qc.json

Identifying most common frame shifts and stop conds

  1. Download metadata to data/metadata_raw.tsv

  2. Run snakemake workflow with following commands/targets:

    snakemake --profile=profiles/clades pre-processed/frameshifts.tsv -R select_frameshifts
    snakemake --profile=profiles/clades pre-processed/stops.tsv -R select_stops
  3. Format most commons stops/fs into qc.json JSON format using

    python3 scripts/common_stops.py
    python3 scripts/common_frameshifts.py
  4. Manually check resul for plausibility and add to qc.json

Committing to data repo

  1. Go to nextclade_data_workflow repo

  2. Checkout branch, open PR to master

  3. Copy output from workflow repo to data repo

    cp -r output/sars-cov-2/references/MN908947/versions/  ../../nextclade_data/data/datasets/sars-cov-2/references/MN908947/versions
  4. Update changelog.md

  5. Get Ivan to review

  6. Merge into master

Release process

Follow release guidelines as outlined here: https://github.com/nextstrain/nextclade_data#dataset-release-process