nextstrain/seasonal-flu

Use FRA data instead of HI data for live H3N2 builds

huddlej opened this issue · 1 comments

Context

HI assays do not perform as well for recent H3N2 viruses as FRA/HINT assays (3c3.A is the exception) and these assays produce different titer model parameters than the FRA/HINT assays. We know these assays measure different aspects of viral infection and that FRA/HINT assays were historically performed when HI assays didn't work making direct comparisons between these data biased.

Although I had previously thought that FRA/HINT data were more numerous than HI data, the number of measurements submitted since Jan 1, 2019 is nearly the same for both assay types. It turns out that the majority of our FRA/HINT data come from the CDC and the majority of the HI data come from VIDRL. This is an important distinction because the live builds only include data from the CDC and are thus relatively depleted for HI data. Here is the breakdown of recent data based on the inclusion date:

>>> hi.query("inclusion_date > '2019-01-01'")["center"].value_counts()
vidrl    21415
crick     6621
cdc       3108
>>> fra.query("inclusion_date > '2019-01-01'")["center"].value_counts()
cdc      18649
vidrl     3987
niid      3851
crick     1097

Description

We should try building a 2y H3N2 HA tree with HI data and another with FRA data, keeping all other parameters of the builds the same (including the forecasting model). Then, we should compare the trees and the corresponding titer model results head-to-head. We should look for:

  1. major changes in overall tree topology based on strains selected in an assay-specific manner
  2. changes in the titer model range and specific branches/substitutions with antigenic advance by these models
  3. changes in forecasts

Closed by bc75df5