nextstrain/forecasts-ncov

Add QC for clade sequence counts

Opened this issue · 0 comments

Context

@marlinfiggins flagged there is an issue in all-time the analysis where there are JN.1 sequences from 2020 during the forecasts-ncov meeting on 2024-06-03

Description

Sequences counts for clades/lineages that are earlier than their first appearance date should be excluded.

Two potential solutions

  1. Use ncov 's exclude.txt to exclude known outliers from the counts. However, this only captures a subset of outliers because the exclude.txt only gets updated based on the results of small subsampled trees.
  2. Similar to covariants, add a list of first dates for clades/lineages and automatically exclude counts earlier than the first date.