Background: A newly emerging novel coronavirus appeared and rapidly spread worldwide and World Health Organization declared a pandemic on March 11, 2020. The roles and characteristics of coronavirus have captured much attention due to its power of causing a wide variety of infectious diseases, from mild to severe on humans. The detection of the lethality of human coronavirus is key to estimate the viral toxicity and provide perspective for treatment.
Results: We developed alignment-free machine learning approaches for an ultra-fast and highly accurate prediction of the lethality of potential human-adapted coronavirus using genomic nucleotide. We performed extensive experiments through six different feature transformation and machine learning algorithms in combination with digital signal processing to infer the lethality of possible future novel coronaviruses using previous existing strains. The results tested on SARS-CoV, MERS-Cov and SARS-CoV-2 datasets show an average 96.7% prediction accuracy. We also provide preliminary analysis validating the effectiveness of our models through other human coronaviruses. Conclusion: Our study achieves high levels of prediction performance based on raw RNA sequences alone without genome annotations and specialized biological knowledge. The results demonstrate that, for any novel human coronavirus strains, this alignment-free machine learning-based approach can offer a reliable real-time estimation for its viral lethality.