/sars-cov-2-transcriptome

Supplemental code for "the architecture of SARS-CoV-2 transcriptome" paper

Primary LanguageJupyter Notebook

The architecture of SARS-CoV-2 transcriptome

Dongwan Kim1,2, Joo-Yeon Lee3, Jeong-Sun Yang3, Jun Won Kim3, V. Narry Kim1,2,*, and Hyeshik Chang1,2,*

1 Center for RNA Research, Institute for Basic Science (IBS), Seoul 08826, Republic of Korea
2 School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
3 Korea National Institute of Health, Korea Centers for Disease Control & Prevention, Osong 28159, Republic of Korea
* Correspondence: narrykim@snu.ac.kr and hyeshik@snu.ac.kr

Summary

SARS-CoV-2 is a betacoronavirus responsible for the COVID-19 pandemic. Although the SARS-CoV-2 genome was reported recently, its transcriptomic architecture is unknown. Utilizing two complementary sequencing techniques, we here present a high-resolution map of the SARS-CoV-2 transcriptome and epitranscriptome. DNA nanoball sequencing shows that the transcriptome is highly complex owing to numerous discontinuous transcription events. In addition to the canonical genomic and subgenomic RNAs, SARS-CoV-2 produces transcripts encoding unknown ORFs with fusion, deletion, and/or frameshift. Using nanopore direct RNA sequencing, we further find at least 41 RNA modification sites on viral transcripts, with the most frequent motif, AAGAA. Modified RNAs have shorter poly(A) tails than unmodified RNAs, suggesting a link between the modification and the 3′ tail. Functional investigation of the unknown transcripts and RNA modifications discovered in this study will open new directions to our understanding of the life cycle and pathogenicity of SARS-CoV-2.

See https://www.biorxiv.org/content/10.1101/2020.03.12.988865v2
Data can be downloaded from https://osf.io/8f6n9/
Data can be also browsed using the UCSC Genome Browser.