/GraphV

GraphV - An RNA virus strain-level identification tool using long reads and genome graph.

Primary LanguagePython

GraphV - An RNA virus strain-level identification tool using long reads and genome graph.

Version: V1.0


Dependencies:

Make sure these programs have been installed and added in path.

Install (Only for linux or ubuntu)

git clone https://github.com/liaoherui/GraphV.git

Then, you need to download the genome graph database of 8 RNA viruses. Run:
cd GraphV
sh download.sh

If you fail to download database with download.sh, try another script then, Run:
cd GraphV
sh download_2.sh

Still failed, please email to the author to get the database.

Usage

Use python GraphV.py -h to check the usage.

A demo real data of SARS-Cov-2 is included in "Data" folder, which can be uesd for test.

A running demo: (Result will be generated in the folder called "GraphV" by default)

python GraphV.py -i Data/SRR10948550_801.fastq -v SCOV2

The below table shows relationship between virus name and virus_type parameter:

Virus Name virus_type parameter
SARS-Cov-2 SCOV2
HIV HIV
HCV HCV
Ebolavirus EBV
Zika virus ZKV
Dengue virus DGV
Lassa virus LSV
Enterovirus ETVA

Output file

There will be 5 output files of GraphV.

  1. *.json file --- The alignment result file from GraphAligner.

  2. *_Most_possible_Strain_report.txt --- The final report generated by GraphV.

  3. *_All_Cov.txt --- The GraphV result file which is sorted by the descending order of alignment coverage.

  4. *_All_Cov_by_length.txt --- The GraphV result file which is sorted by the descending order of alignment length.

  5. *_Unique_Cov.txt --- The GraphV result file which is sorted by the descending order of unique coverage.

Note:

For 3, 4, the meaning of each column in the file is: Strain name, alignment length, genome length, alignment coverage.
For 5, the meaning of each column in the file is: Strain name, unique alignment length, genome length, unique coverage, strain name in database.