The COVID-19 pandemic has shown the impact of genomic surveillance of emergent and re-emergent pathogens based on Next Generation Sequencing (NGS), as has been recognized by the World Health Organization [1,2]. Guiding the Public Health response has been accelerated by to the generalization of the NGS, allowing the identification and monitoring of emerging SARS-CoV-2 variants in a routine basis across the World.
Here we present a public repository of influenza viruses (Inf) related resources maintained by the ITER-FIISC-HUNSC-ULL task force.
This is the result of a continuous collaborative effort of the following Institutions and Laboratories:
- Servicio de Microbiología, Hospital Universitario Ntra. Sra. de Candelaria, 38010 Santa Cruz de Tenerife, Spain.
- Fundación Canaria Instituto de Investigación Sanitaria de Canarias at the Research Unit, Hospital Universitario Ntra. Sra. de Candelaria, 38010 Santa Cruz de Tenerife, Spain.
- Laboratorio de Inmunología Celular y Viral, Unidad de Farmacología, Facultad de Medicina, Universidad de La Laguna, 38200 San Cristóbal de La Laguna, Spain.
- Genomics Division, Instituto Tecnológico y de Energías Renovables, 38600 Santa Cruz de Tenerife, Spain.
- A draft of the first influenza genomes from the Canary Islands, Spain, 2022-2023
- Protocols for library preparation and sequencing of influenza virus genomes
- Bioinformatic pipelines
- Code for Illumina short-reads processing
- List of bioinformatic software used in our pipelines
- Useful files for the pipelines (FASTA references, BED files, etc.)
- Sequences and Classification Results
- Other useful repositories with resources to study influenza
- References
- Acknowledgements
- License and Attribution
- Participating
- How to cite this work
- Update logs
The first genome sequences of influenza virus A/H1N1, A/H3N2, and B (Victoria) described by us are phylogenetically related to the multiple virus genomes deposited in GISAID that correspond to the past 2022-2023 seasonal flu wave in the Northern hemisphere, as shown in Figures 1 and 2.
Figure 1. A phylogenetic tree depicting the position of the genome draft of influenza A/H1N1 sampled in the period November-December 2022, from patients from the Canary Islands along with NCBI GenBank publicly available sequences as computed by Nextstrain using the HA gene and influenza A H1N1pdm HA [A/California/07/2009 (CY121680)] as reference.
Figure 2. A phylogenetic tree depicting the position of the genome draft of influenza A/H3N2 sampled in the period October-December 2022, from patients from the Canary Islands along with NCBI GenBank publicly available sequences as computed by Nextstrain using the HA gene and influenza A H3N2 HA [A/Wisconsin/67/2005 (CY163680)] as reference.
One of the sequencing strategies followed for SARS-CoV-2 surveillance is the use of amplicons derived from primer pools designed by the ARTIC community following a tiling approach [3,4,5]. However, this approach is not suitable for influenza viruses because of their mutational burden and higher variability. Besides, it is possible to use the so-called universal primers taking advantage of the conserved promoter regions at the 5' and 3' ends of the influenza genome segments in order to amplify the entire genome using larger amplicons [6,7] (see the PCR-primers section).
Lin Y. et al. [8] have adapted the Illumina COVIDSeq™ Assay (RUO) kit to obtain the genomic sequence of influenza A and B viruses. Their protocol uses a combination of two nested primer sets, followed by the Illumina COVIDSeq™ Assay protocol with minor modifications, taking advantage of the same reagents included in the kit: A sequencing and subtyping protocol for Influenza A and B viruses using Illumina® COVIDSeq™ Assay Kit at protocols.io.
According to Lin Y. et al.[8], this protocol provides accurate information for subtyping, lineage tracing, and antiviral resistance detection of influenza viruses.
Oxford Nanopore Technologies-based protocol
Work in progress. Come back by the end of December 2023 to find new stuff in this section.
PCR Universal Primers from Zhout et al. (2012, 2014).
The following diagram (Figure 3) represents a full pipeline used to derive the consensus FASTA sequence of influenza viruses using short-read Illumina sequencing.
The pipeline process short reads, from the basecalling to the final consensus FASTA sequence, and ends with downstream analysis such as the phylogenetic inference.
Several consensus influenza A/H1N1 and A/H3N2 sequences derived from the pipeline based on the mapping of Illumina short reads against an influenza virus reference genome have been obtained so far. They have been deposited in GISAID EpiFlu (see 'Sequences' section below).
Figure 3. Schematic bioinformatic pipeline to obtain the influenza sequences and to infer phylogenetic relationships with other influenza virus genomes available obtained from public repositories as provided by Nextstrain.
A heatmap of amplicon median coverage (x) for influenza A/H1N1 and A/H3N2 sequences is shown in Figure 4.
Figure 4. Heatmap of amplicon median coverage for influenza A/H1N1 (up) and A/H3N2 sequences (bottom) obtained with an Illumina NextSeq550 sequencer collected from nasopharyngeal swabs from seven patients (Ct<30).
Code for Illumina short-reads processing
See a detailed pipeline with examples of command usage for Illumina short reads.
List of bioinformatic software used in our pipelines
Bioinformatic software (click to display):
- Conda manual for installation of numerous open-source tools used in these pipelines:Conda documentation
- Programming environment of general purpose: R v.4.1.3
- Quality Control of Illumina reads: FastQC v0.11.9
- Adapter trimming: fastp v0.23.2
- Remove Human mapping-reads from your FASTQ files: Kraken2 v.2.1.2. If you have issues when downloading the database indexes, try this alternative site from BenLangmead.
- Visualization of Kraken2 reports: Pavian v.1.0
- Assembly of Illumina short-reads: Unicycler v0.5.0
- Benchmarking and quality control of assemblies: QUAST v.5.0.2
- CLI tool to search in nucleotide databases using a nucleotide query: BLAST+ v.2.12.0
- Mapping of short-reads: BWA v.0.7.17-r1188
- Get mapping statistics, manipulate BAM files, and generate mpileups for FASTA consensus: SAMtools v.1.6
- Compute the depth of coverage and other statistics: Mosdepth v.0.3.3
- Perform the variant calling and consensus: iVar v.1.3.1
- Multiple Sample Alignment: MAFFT v.7.505
- Phylogenomic inference and tree computing: IQ-TREE v.2.2.0.3
- Framework for analyses and visualization of pathogen genome data (Nextstrain-Influenza in this case): Nextstrain
- Visualization of phylogenetic trees: Figtree
- Visualization of phylogenetic trees: ggtree 3.15
- Annotation of genomes: SnpEff v.5.1d
A/H1N1 | A/H3N2 | B Victoria | B Yamagata |
---|---|---|---|
Inf. A virus A/California/07/2009(H1N1) |
Inf. A virus A/Wisconsin/67/2005(H3N2) |
Inf. B virus Victoria B/Brisbane/60/2008 |
Inf. B virus Yamagata B/Wisconsin/01/2010 |
Primer schemes in BED format are required in the trimming step of PCR-primers.
Example of a BED file
for segment 1 (FJ984387.1) of influenza A virus (A/California/07/2009(H1N1)) using the primer-scheme:
FJ984387.1 1 18 Seg1_Uni12/Inf-1_LEFT 1 - GGGGGGAGCAAAAGCAGG
FJ984387.1 1 18 Seg1_Uni12/Inf-3_LEFT 1 - GGGGGGAGCGAAAGCAGG
FJ984387.1 2258 2280 Seg1_Uni13/Inf-1_RIGHT 1 + CGGGTTATTAGTAGAAACAAGG
Please, download the BED files separately (one file per influenza segment).
The following BED files can be feeded to mosdepth to compute de mean coverage per segment in each virus strain:
- mosdepth BED file for Inf. A virus [A/California/07/2009(H1N1)]
- mosdepth BED file for Inf. A virus [A/Wisconsin/67/2005(H3N2)]
- mosdepth BED file for Inf. B virus Victoria [B/Brisbane/60/2008]
Sequences are being deposited at GISAID. You may search in GISAID by using the accession codes provided or proceed directly downloading our influenza sequences using the links provided below.
Sequences of influenza A/H1N1
- Accesion 1: EPI_ISL_18128205
- Accesion 2: EPI_ISL_18308442
- Accesion 3: EPI_ISL_18308501
Sequences of influenza A/H3N2
- Accesion 4: EPI_ISL_18313569
- Accesion 5: EPI_ISL_18313571
- Accesion 6: EPI_ISL_18313572
Sequences of influenza B Victoria
- Accesion 7: EPI_ISL_18313574
(*) NOTE: Some segment/s sequence/s may be incomplete.
GISAID accession | Isolate name | Subtype | Clade | Location |
---|---|---|---|---|
EPI_ISL_18128205 | A/Spain/CN-HUNSC_ITER | A/H1N1 | 6B.1A.5a.2a | Europe/Spain/Canary Islands |
EPI_ISL_18308442 | A/Spain/CN-HUNSC_ITER | A/H1N1 | 6B.1A.5a.2a.1 | Europe/Spain/Canary Islands |
EPI_ISL_18308501 | A/Spain/CN-HUNSC_ITER | A/H1N1 | 6B.1A.5a.2a.1 | Europe/Spain/Canary Islands |
EPI_ISL_18313569 | A/Spain/CN-HUNSC_ITER | A/H3N2 | 3C.2a1b.2a.2b | Europe/Spain/Canary Islands |
EPI_ISL_18313571 | A/Spain/CN-HUNSC_ITER | A/H3N2 | 3C.2a1b.2a.2b | Europe/Spain/Canary Islands |
EPI_ISL_18313572 | A/Spain/CN-HUNSC_ITER | A/H3N2 | 3C.2a1b.2a.2b | Europe/Spain/Canary Islands |
EPI_ISL_18313574 | B/Spain/CN-HUNSC_ITER | B | V1A.3a.2 | Europe/Spain/Canary Islands |
(*) NOTE: other metadata are available for these samples in GISAID and from the authors upon a reasonable request.
Kudos to all research teams behind the scenes in all these repositories and web platforms (click to display):
- Genomic sequencing of SARS-CoV-2. A guide to implementation for maximum impact on public health, WHO, January 8, 2021.
- Report “Global genomic surveillance strategy for pathogens with pandemic and epidemic potential, 2022-2032”. Ginebra, WHO, 2022.
- Gohl DM, Garbe J, Grady P, et al. A rapid, cost-effective tailed amplicon method for sequencing SARS-CoV-2. BMC Genomics. 2020;21(1):863. Published 2020 Dec 4. doi:10.1186/s12864-020-07283-6.
- Itokawa K, Sekizuka T, Hashino M, Tanaka R, Kuroda M. Disentangling primer interactions improves SARS-CoV-2 genome sequencing by multiplex tiling PCR. PLoS One. 2020;15(9):e0239403. Published 2020 Sep 18. doi:10.1371/journal.pone.0239403.
- Koskela von Sydow A, Lindqvist CM, Asghar N, et al. Comparison of SARS-CoV-2 whole genome sequencing using tiled amplicon enrichment and bait hybridization. Sci Rep. 2023;13(1):6461. Published 2023 Apr 20. doi:10.1038/s41598-023-33168-1.
- Zhou B, Wentworth DE. Influenza A virus molecular virology techniques. Methods Mol Biol. 2012;865:175-192. doi:10.1007/978-1-61779-621-0_11.
- Zhou B, Lin X, Wang W, et al. Universal influenza B virus genomic amplification facilitates sequencing, diagnostics, and reverse genetics. J Clin Microbiol. 2014;52(5):1330-1337. doi:10.1128/JCM.03265-13.
- Ying Lin, Jeffrey Koble, Priyanka Prashar, Anita Pottekat, Christina Middle, Scott Kuersten, Michael Oberholzer, Robert Brazas, Darcy Whitlock, Robert Schlaberg, Gary P. Schroth. A sequencing and subtyping protocol for Influenza A and B viruses using Illumina® COVIDSeq™ Assay Kit. Protocols.io. doi:dx.doi.org/10.17504/protocols.io.n2bvj8mrxgk5/v1
This study has been funded by Cabildo Insular de Tenerife (CGIEU0000219140 and "Apuestas científicas del ITER para colaborar en la lucha contra la COVID-19"); by the agreement with Instituto Tecnológico y de Energías Renovables (ITER) to strengthen scientific and technological education, training, research, development and innovation in Genomics, epidemiological surveillance based on massive sequencing, Personalized Medicine and Biotechnology (OA17/008 and OA23/043); and by the agreement between Consejería de Educación, Universidades, Cultura y Deportes del Gobierno de Canarias y Cabildo Insular de Tenerife, 2022-2025 (AC0000014697).
This study is also an activity within the project Consolidation of WGS and RT-PCR activities for SARS-CoV-2 in Spain towards sustainable use and integration of enhanced infrastructure and capacities in the RELECOV network (101113109 - RELECOV 2.0) of the EU4Health Programme (EU4H) by the European Health and Digital Executive Agency (HaDEA), under the coordination of Instituto de Salud Carlos III (ISCIII).
We acknowledge the researchers and their institutions who released influenza sequences through NCBI GenBank, GISAID, and ENA that are being used in our studies.
We also thank the authors, the laboratories that originated and submitted the genetic sequences and the metadata for sharing their work, as shown on Nextstrain, and:
- Hadfield et al, Nextstrain: real-time tracking of pathogen evolution, Bioinformatics (2018).
- Sagulenko et al, TreeTime: Maximum-likelihood phylodynamic analysis, Virus Evolution (2017).
This repository and data exports are released under the CC BY 4.0 license. Please acknowledge the authors, the originating and submitting laboratories for the genetic sequences and metadata, and the open source software used in this work (third-party copyrights and licenses may apply).
Please cite this repository as: "Influenza repository of the Reference Laboratory for Epidemiological Surveillance of Pathogens in the Canary Islands (accessed on YYYY-MM-DD)". And do not forget to cite the paper (see the section "How to cite" below) when it becomes available.
Want to share your relevant links? Place a Direct Message to @labcflores on X (see below).
Follow us on @labcflores
This work has not been publised yet. See 'License and Attribution' section to cite this repository.
To use the deposited sequences at GISAID, please, acknowledge this work as recommended by GISAID. Find the 'GISAID acknowledge tables' here.
December 29, 2023. Several updates follows: figure 3 is updated; the bioinformatic workflow is enriched with an addition step of de novo assembly with Unycicler (SPAdes) in the case of multiple sequence alignment failure; BED files for Influenza A (H1N1 and H3N2) and B have been updated (strands and coordinates for the oligos were updated from a previous version of these files); useful BED files for mosdepth are provided to compute mean coverage-per-region in each strain.
September 29, 2023. This repository became fully public. Enjoy the reading! ;=)
September 26, 2023. Updated many sections: bioinformatic pipeline, primer-schemes (required BED files for the pipelines), deposited sequences, Influenza virus A and B reference sequences, and other useful repositories with resources to study Influenza.
July 26, 2023. Created the private version of this repository.