Add QC for clade sequence counts
Opened this issue · 0 comments
joverlee521 commented
Context
@marlinfiggins flagged there is an issue in all-time the analysis where there are JN.1 sequences from 2020 during the forecasts-ncov meeting on 2024-06-03
Description
Sequences counts for clades/lineages that are earlier than their first appearance date should be excluded.
Two potential solutions
- Use ncov 's exclude.txt to exclude known outliers from the counts. However, this only captures a subset of outliers because the exclude.txt only gets updated based on the results of small subsampled trees.
- Similar to covariants, add a list of first dates for clades/lineages and automatically exclude counts earlier than the first date.