[REQUEST] Update Aggregate Lineages to use CDC grouping URL
Closed this issue · 0 comments
Feature Request
Currently we are having to manually update the cdc_lineage_groups.json file each week which is an input in the lineage_calling_and_results.wdl. Sam has written code for the cloud-run-aggregate-lineages repo that downloads the cdc aggregrate lineage groupings json directly from the CDC Covid variant dashboard website. We want to move away from manually updating the cdc_lineage_groups.json and incorporate Sam's automated code into the lineage_calling_and_results.wdl.
Solution
Pull the code from the cloud-run-aggregate-lineages repo that automates pulling the json from CDC's dashboard. Incorporate the code into the concat_seq_metrics_and_lineage_results.py script for the results_table task in the lineage_calling_and_results.wdl. Update the task and wdl inputs as needed.
Downstream effects
Code duplicates - We also aggregate lineages based on CDC grouping in the wastewater heatmap co-lab notebook.