Estimating flu clade and mutation frequencies

This README is for the data analysis pipeline. For the web interface, see web/README.md.

Development

Setup

Using Nextstrain CLI

# Linux
curl -fsSL --proto '=https' https://nextstrain.org/cli/installer/linux | bash
# Mac
curl -fsSL --proto '=https' https://nextstrain.org/cli/installer/mac | bash

You can set it up to use Docker or a Nextstrain managed conda environment (completely independent of any other conda environments you may have).

# Managed conda
nextstrain setup --set-default conda
# Docker
nextstrain setup --set-default docker

Run analysis:

nextstrain build . --profile profiles/flu

Using custom conda or Python environment

You will have to have at least the following packages/binaries installed:

Python
- snakemake
- augur
- polars
nextclade

Then run using:

snakemake --profile profiles/flu

Viewing results in web app

Copy snakemake workflow results to data_web/inputs, ensuring that correct filenames are used, e.g.:

cp results/h3n2/region-country-frequencies.csv data_web/inputs/flu-h3n2.csv

Then process the csv files into json:

python scripts/web_convert.py --input-pathogens-json data_web/inputs/pathogens.json --output-dir web/public/data

TODO

Provide mamba environment file for simpler setup
Agree on formatters to use (snakefmt and black?)