nextstrain/seasonal-flu

Automate monthly reports

huddlej opened this issue · 0 comments

Context

We produce monthly reports describing seasonal influenza circulation patterns. A substantial portion of time spent on these reports is tedious copying and pasting or running commands manually that could be automated.

Description

The following specific tasks will minimize the manual work associated with reports and allow us to focus on the science.

  • Schedule nextflu-private builds to run and deploy after data upload (like the public builds). This should be easy to do now. We need to add rules that produce dated Auspice JSONs and deploy these JSONs to the desired group.
  • #162
  • Use existing public 2y builds as “representative” trees in our reports. The nextflu-private builds include all strains with available titer data which makes them less useful for metrics like LBI or looking at regional frequencies. We build a representative H3N2 build, but we should just use the public builds since those builds use representative sampling already. This task depends on the new proposed functionality that James has been working on to allow access to builds available as of a given date (e.g., @2023-11-17), so narratives from a specific date don’t reference builds that will change in the future.
  • Automatically generate Markdown tables of new sequences per clade and clade coverage by titer references. The workflow logic to do this was added in #129, but we still need to add explicit dependencies for tabulate to our Docker and Conda environments.
    • Summarize all new sequences per clade using Nextclade annotations from the full dataset instead of using subsampled data associated with representative or titers builds. Depends on the addition of an earlier step in the workflow that runs Nextclade on all data as mentioned in #144.
    • Summarize haplotype coverage by titer references using frequencies per haplotype from all available data (i.e., flu_frequecies data) and include distinct references per haplotype across all titer collections (e.g., cell FRA, cell HI, egg FRA, egg HI, etc.)
  • Use a narrative template file to stub out a date-specific narrative for a given set of dated builds. Avoid the need to update all URLs and the YAML front matter in the narrative to the date of the latest builds. We could define our own simple narrative template or use a standard template language like Jinja2. We could generate the stubbed narrative as part of the build and deploy process above, such that the date of the builds is available to plug into the template, and deploy the dated draft of the narrative along with the builds.
  • Automatically add links for the latest builds to the nextflu-private group landing page after builds have been deployed.
  • Run TreeKnit and/or pathogen-embed on HA/NA trees/alignments and report a list of reassorted clades. Richard has already implemented workflow logic for TreeKnit. We can’t run TreeKnit from Conda, but we could add it to the Docker image and run it on AWS Batch. We don’t have a Conda package for pathogen-embed yet, but that would be easy to add.