Influenza virus genomic surveillance in the Canary Islands

The COVID-19 pandemic has shown the impact of genomic surveillance of emergent and re-emergent pathogens based on Next Generation Sequencing (NGS), as has been recognized by the World Health Organization [1,2]. Guiding the Public Health response has been accelerated by to the generalization of the NGS, allowing the identification and monitoring of emerging SARS-CoV-2 variants in a routine basis across the World.

Here we present a public repository of influenza viruses (Inf) related resources maintained by the ITER-FIISC-HUNSC-ULL task force.

This is the result of a continuous collaborative effort of the following Institutions and Laboratories:

Servicio de Microbiología, Hospital Universitario Ntra. Sra. de Candelaria, 38010 Santa Cruz de Tenerife, Spain.
Fundación Canaria Instituto de Investigación Sanitaria de Canarias at the Research Unit, Hospital Universitario Ntra. Sra. de Candelaria, 38010 Santa Cruz de Tenerife, Spain.
Laboratorio de Inmunología Celular y Viral, Unidad de Farmacología, Facultad de Medicina, Universidad de La Laguna, 38200 San Cristóbal de La Laguna, Spain.
Genomics Division, Instituto Tecnológico y de Energías Renovables, 38600 Santa Cruz de Tenerife, Spain.

A draft of the first influenza genomes from the Canary Islands, Spain, 2022-2023
Protocols for library preparation and sequencing of influenza virus genomes

Illumina-based protocol
Oxford Nanopore Technologies-based protocol
PCR universal primers

Bioinformatic pipelines

Code for Illumina short-reads processing
List of bioinformatic software used in our pipelines
Useful files for the pipelines (FASTA references, BED files, etc.)

Sequences and Classification Results
Other useful repositories with resources to study influenza
References
Acknowledgements
License and Attribution
Participating
How to cite this work
Update logs

A draft of the first influenza genomes from the Canary Islands, Spain, 2022-2023

The first genome sequences of influenza virus A/H1N1, A/H3N2, and B (Victoria) described by us are phylogenetically related to the multiple virus genomes deposited in GISAID that correspond to the past 2022-2023 seasonal flu wave in the Northern hemisphere, as shown in Figures 1 and 2.

Figure 1. A phylogenetic tree depicting the position of the genome draft of influenza A/H1N1 sampled in the period November-December 2022, from patients from the Canary Islands along with NCBI GenBank publicly available sequences as computed by Nextstrain using the HA gene and influenza A H1N1pdm HA [A/California/07/2009 (CY121680)] as reference.

Figure 2. A phylogenetic tree depicting the position of the genome draft of influenza A/H3N2 sampled in the period October-December 2022, from patients from the Canary Islands along with NCBI GenBank publicly available sequences as computed by Nextstrain using the HA gene and influenza A H3N2 HA [A/Wisconsin/67/2005 (CY163680)] as reference.

Protocols for library preparation and sequencing of influenza virus genomes

Illumina-based protocol

One of the sequencing strategies followed for SARS-CoV-2 surveillance is the use of amplicons derived from primer pools designed by the ARTIC community following a tiling approach [3,4,5]. However, this approach is not suitable for influenza viruses because of their mutational burden and higher variability. Besides, it is possible to use the so-called universal primers taking advantage of the conserved promoter regions at the 5' and 3' ends of the influenza genome segments in order to amplify the entire genome using larger amplicons [6,7] (see the PCR-primers section).

Lin Y. et al. [8] have adapted the Illumina COVIDSeq™ Assay (RUO) kit to obtain the genomic sequence of influenza A and B viruses. Their protocol uses a combination of two nested primer sets, followed by the Illumina COVIDSeq™ Assay protocol with minor modifications, taking advantage of the same reagents included in the kit: A sequencing and subtyping protocol for Influenza A and B viruses using Illumina® COVIDSeq™ Assay Kit at protocols.io.

According to Lin Y. et al.[8], this protocol provides accurate information for subtyping, lineage tracing, and antiviral resistance detection of influenza viruses.

Oxford Nanopore Technologies-based protocol

Work in progress. Come back by the end of December 2023 to find new stuff in this section.

PCR universal primers

PCR Universal Primers from Zhout et al. (2012, 2014).

Bioinformatic pipelines

The following diagram (Figure 3) represents a full pipeline used to derive the consensus FASTA sequence of influenza viruses using short-read Illumina sequencing.

The pipeline process short reads, from the basecalling to the final consensus FASTA sequence, and ends with downstream analysis such as the phylogenetic inference.

Several consensus influenza A/H1N1 and A/H3N2 sequences derived from the pipeline based on the mapping of Illumina short reads against an influenza virus reference genome have been obtained so far. They have been deposited in GISAID EpiFlu (see 'Sequences' section below).

Figure 3. Schematic bioinformatic pipeline to obtain the influenza sequences and to infer phylogenetic relationships with other influenza virus genomes available obtained from public repositories as provided by Nextstrain.

A heatmap of amplicon median coverage (x) for influenza A/H1N1 and A/H3N2 sequences is shown in Figure 4.

Figure 4. Heatmap of amplicon median coverage for influenza A/H1N1 (up) and A/H3N2 sequences (bottom) obtained with an Illumina NextSeq550 sequencer collected from nasopharyngeal swabs from seven patients (Ct<30).

Code for Illumina short-reads processing

See a detailed pipeline with examples of command usage for Illumina short reads.

List of bioinformatic software used in our pipelines

Bioinformatic software (click to display):

Conda manual for installation of numerous open-source tools used in these pipelines:Conda documentation
Programming environment of general purpose: R v.4.1.3
Quality Control of Illumina reads: FastQC v0.11.9
Adapter trimming: fastp v0.23.2
Remove Human mapping-reads from your FASTQ files: Kraken2 v.2.1.2. If you have issues when downloading the database indexes, try this alternative site from BenLangmead.
Visualization of Kraken2 reports: Pavian v.1.0
Assembly of Illumina short-reads: Unicycler v0.5.0
Benchmarking and quality control of assemblies: QUAST v.5.0.2
CLI tool to search in nucleotide databases using a nucleotide query: BLAST+ v.2.12.0
Mapping of short-reads: BWA v.0.7.17-r1188
Get mapping statistics, manipulate BAM files, and generate mpileups for FASTA consensus: SAMtools v.1.6
Compute the depth of coverage and other statistics: Mosdepth v.0.3.3
Perform the variant calling and consensus: iVar v.1.3.1
Multiple Sample Alignment: MAFFT v.7.505
Phylogenomic inference and tree computing: IQ-TREE v.2.2.0.3
Framework for analyses and visualization of pathogen genome data (Nextstrain-Influenza in this case): Nextstrain
Visualization of phylogenetic trees: Figtree
Visualization of phylogenetic trees: ggtree 3.15
Annotation of genomes: SnpEff v.5.1d

Useful files for the pipelines

Reference sequences

A/H1N1	A/H3N2	B Victoria	B Yamagata
Inf. A virus A/California/07/2009(H1N1)	Inf. A virus A/Wisconsin/67/2005(H3N2)	Inf. B virus Victoria B/Brisbane/60/2008	Inf. B virus Yamagata B/Wisconsin/01/2010

BED files

Primer schemes in BED format are required in the trimming step of PCR-primers.

Example of a BED file for segment 1 (FJ984387.1) of influenza A virus (A/California/07/2009(H1N1)) using the primer-scheme:

FJ984387.1	1	18	Seg1_Uni12/Inf-1_LEFT	1	-	GGGGGGAGCAAAAGCAGG
FJ984387.1	1	18	Seg1_Uni12/Inf-3_LEFT	1	-	GGGGGGAGCGAAAGCAGG
FJ984387.1	2258	2280	Seg1_Uni13/Inf-1_RIGHT	1	+	CGGGTTATTAGTAGAAACAAGG

Please, download the BED files separately (one file per influenza segment).

Virus strain	Seg-1 (PB2)	Seg-2 (PB1)	Seg-3 (PA)	Seg-4 (HA)	Seg-5 (NP)	Seg-6 (NA)	Seg-7 (MP)	Seg-8 (NS)
Inf. A virus A/California/07/2009(H1N1)
Inf. A virus A/Wisconsin/67/2005(H3N2)
Inf. B virus Victoria B/Brisbane/60/2008

Collapsed-per-region BED files for mosdepth

The following BED files can be feeded to mosdepth to compute de mean coverage per segment in each virus strain:

Sequences and Classification Results

Deposited sequences

Sequences are being deposited at GISAID. You may search in GISAID by using the accession codes provided or proceed directly downloading our influenza sequences using the links provided below.

Sequences of influenza A/H1N1

Accesion 1: EPI_ISL_18128205
Accesion 2: EPI_ISL_18308442
Accesion 3: EPI_ISL_18308501

Sequences of influenza A/H3N2

Accesion 4: EPI_ISL_18313569
Accesion 5: EPI_ISL_18313571
Accesion 6: EPI_ISL_18313572

Sequences of influenza B Victoria

Accesion 7: EPI_ISL_18313574

(*) NOTE: Some segment/s sequence/s may be incomplete.

Classification Results

GISAID accession	Isolate name	Subtype	Clade	Location
EPI_ISL_18128205	A/Spain/CN-HUNSC_ITER	A/H1N1	6B.1A.5a.2a	Europe/Spain/Canary Islands
EPI_ISL_18308442	A/Spain/CN-HUNSC_ITER	A/H1N1	6B.1A.5a.2a.1	Europe/Spain/Canary Islands
EPI_ISL_18308501	A/Spain/CN-HUNSC_ITER	A/H1N1	6B.1A.5a.2a.1	Europe/Spain/Canary Islands
EPI_ISL_18313569	A/Spain/CN-HUNSC_ITER	A/H3N2	3C.2a1b.2a.2b	Europe/Spain/Canary Islands
EPI_ISL_18313571	A/Spain/CN-HUNSC_ITER	A/H3N2	3C.2a1b.2a.2b	Europe/Spain/Canary Islands
EPI_ISL_18313572	A/Spain/CN-HUNSC_ITER	A/H3N2	3C.2a1b.2a.2b	Europe/Spain/Canary Islands
EPI_ISL_18313574	B/Spain/CN-HUNSC_ITER	B	V1A.3a.2	Europe/Spain/Canary Islands

(*) NOTE: other metadata are available for these samples in GISAID and from the authors upon a reasonable request.

Other useful repositories with resources to study influenza

Kudos to all research teams behind the scenes in all these repositories and web platforms (click to display):

References

Genomic sequencing of SARS-CoV-2. A guide to implementation for maximum impact on public health, WHO, January 8, 2021.
Report “Global genomic surveillance strategy for pathogens with pandemic and epidemic potential, 2022-2032”. Ginebra, WHO, 2022.
Gohl DM, Garbe J, Grady P, et al. A rapid, cost-effective tailed amplicon method for sequencing SARS-CoV-2. BMC Genomics. 2020;21(1):863. Published 2020 Dec 4. doi:10.1186/s12864-020-07283-6.
Itokawa K, Sekizuka T, Hashino M, Tanaka R, Kuroda M. Disentangling primer interactions improves SARS-CoV-2 genome sequencing by multiplex tiling PCR. PLoS One. 2020;15(9):e0239403. Published 2020 Sep 18. doi:10.1371/journal.pone.0239403.
Koskela von Sydow A, Lindqvist CM, Asghar N, et al. Comparison of SARS-CoV-2 whole genome sequencing using tiled amplicon enrichment and bait hybridization. Sci Rep. 2023;13(1):6461. Published 2023 Apr 20. doi:10.1038/s41598-023-33168-1.
Zhou B, Wentworth DE. Influenza A virus molecular virology techniques. Methods Mol Biol. 2012;865:175-192. doi:10.1007/978-1-61779-621-0_11.
Zhou B, Lin X, Wang W, et al. Universal influenza B virus genomic amplification facilitates sequencing, diagnostics, and reverse genetics. J Clin Microbiol. 2014;52(5):1330-1337. doi:10.1128/JCM.03265-13.
Ying Lin, Jeffrey Koble, Priyanka Prashar, Anita Pottekat, Christina Middle, Scott Kuersten, Michael Oberholzer, Robert Brazas, Darcy Whitlock, Robert Schlaberg, Gary P. Schroth. A sequencing and subtyping protocol for Influenza A and B viruses using Illumina® COVIDSeq™ Assay Kit. Protocols.io. doi:dx.doi.org/10.17504/protocols.io.n2bvj8mrxgk5/v1

Acknowledgements

This study has been funded by Cabildo Insular de Tenerife (CGIEU0000219140 and "Apuestas científicas del ITER para colaborar en la lucha contra la COVID-19"); by the agreement with Instituto Tecnológico y de Energías Renovables (ITER) to strengthen scientific and technological education, training, research, development and innovation in Genomics, epidemiological surveillance based on massive sequencing, Personalized Medicine and Biotechnology (OA17/008 and OA23/043); and by the agreement between Consejería de Educación, Universidades, Cultura y Deportes del Gobierno de Canarias y Cabildo Insular de Tenerife, 2022-2025 (AC0000014697).

This study is also an activity within the project Consolidation of WGS and RT-PCR activities for SARS-CoV-2 in Spain towards sustainable use and integration of enhanced infrastructure and capacities in the RELECOV network (101113109 - RELECOV 2.0) of the EU4Health Programme (EU4H) by the European Health and Digital Executive Agency (HaDEA), under the coordination of Instituto de Salud Carlos III (ISCIII).

We acknowledge the researchers and their institutions who released influenza sequences through NCBI GenBank, GISAID, and ENA that are being used in our studies.

We also thank the authors, the laboratories that originated and submitted the genetic sequences and the metadata for sharing their work, as shown on Nextstrain, and:

Hadfield et al, Nextstrain: real-time tracking of pathogen evolution, Bioinformatics (2018).
Sagulenko et al, TreeTime: Maximum-likelihood phylodynamic analysis, Virus Evolution (2017).

License and Attribution

This repository and data exports are released under the CC BY 4.0 license. Please acknowledge the authors, the originating and submitting laboratories for the genetic sequences and metadata, and the open source software used in this work (third-party copyrights and licenses may apply).

Please cite this repository as: "Influenza repository of the Reference Laboratory for Epidemiological Surveillance of Pathogens in the Canary Islands (accessed on YYYY-MM-DD)". And do not forget to cite the paper (see the section "How to cite" below) when it becomes available.

Participating

Want to share your relevant links? Place a Direct Message to @labcflores on X (see below).

How to cite this work

This work has not been publised yet. See 'License and Attribution' section to cite this repository.

To use the deposited sequences at GISAID, please, acknowledge this work as recommended by GISAID. Find the 'GISAID acknowledge tables' here.

Update logs

December 29, 2023. Several updates follows: figure 3 is updated; the bioinformatic workflow is enriched with an addition step of de novo assembly with Unycicler (SPAdes) in the case of multiple sequence alignment failure; BED files for Influenza A (H1N1 and H3N2) and B have been updated (strands and coordinates for the oligos were updated from a previous version of these files); useful BED files for mosdepth are provided to compute mean coverage-per-region in each strain.

September 29, 2023. This repository became fully public. Enjoy the reading! ;=)

September 26, 2023. Updated many sections: bioinformatic pipeline, primer-schemes (required BED files for the pipelines), deposited sequences, Influenza virus A and B reference sequences, and other useful repositories with resources to study Influenza.

July 26, 2023. Created the private version of this repository.

genomicsITER/influenza