CDPHE-bioinformatics/CDPHE-SARS-CoV-2

[REQUEST] Update Aggregate Lineages to use CDC grouping URL

Closed this issue · 0 comments

Feature Request

Currently we are having to manually update the cdc_lineage_groups.json file each week which is an input in the lineage_calling_and_results.wdl. Sam has written code for the cloud-run-aggregate-lineages repo that downloads the cdc aggregrate lineage groupings json directly from the CDC Covid variant dashboard website. We want to move away from manually updating the cdc_lineage_groups.json and incorporate Sam's automated code into the lineage_calling_and_results.wdl.

Solution

Pull the code from the cloud-run-aggregate-lineages repo that automates pulling the json from CDC's dashboard. Incorporate the code into the concat_seq_metrics_and_lineage_results.py script for the results_table task in the lineage_calling_and_results.wdl. Update the task and wdl inputs as needed.

Downstream effects

Code duplicates - We also aggregate lineages based on CDC grouping in the wastewater heatmap co-lab notebook.