yyoshiaki/VIRTUS

Difficulty with using wrapper for viral gene counts

KevinMaroney opened this issue · 1 comments

Hi again,

So I followed the links you suggested to create the single virus indices. I made 2 fasta files, one with the complete genome (HPV16_genome.fasta) containing a single entry >HPV16_Complete_Genome, and the other with the annotated genes (HPV16_transcripts.fasta) with >E1^E4 (sequence) >E1 (sequence) etc. as fasta files go. Your tutorial said to just run ./createindex_singlevirus.cwl createindex_singlevirus.job.yaml, so I made that yaml file as suggested which has a slot for both the genome and transcripts:

dir_name_STAR: STAR_index_NC_001526.4_HPV16
genomeFastaFiles:
  class: File
  path: "./HPV16_transcripts.fasta"
genomeSAindexNbases: 3
index_salmon: salmon_index_NC_001526.4_HPV16
runThreadN: 40
transcripts:
  class: File
  path: "./HPV16_genome.fasta"

This was successfuly and generated both a STAR_index for the genome fasta file and a salmon index, I assume, for the transcripts (STAR_index_NC... and salmon_index_NC...).

The issue I'm running into is with the inputs for the VIRTUS_wrapper.py

I do not see an argument specifying salmon_index_virus, but do see it for salmon_index_human.
Do I need to just put the genome and transcripts in the same fasta as opposed to what is suggested in the example .yaml file to be able to generate a figure/comparisons as you did?

I tried to use the following command:

DIR_INDEX_ROOT=~/programs/VIRTUS/workflow
~/programs/VIRTUS/wrapper/VIRTUS_wrapper.py input.fastq.csv \
    --fastq \
    --VIRTUSDir ~/programs/VIRTUS2/workflow/ \
    -s1 _R1_1.fastq.gz \
    -s2 _R2_2.fastq.gz \
    --genomeDir_human ~/programs/VIRTUS/workflow/STAR_index_human \
    --genomeDir_virus ~/programs/VIRTUS/workflow/STAR_index_NC_001526.4_HPV16 \
    --salmon_index_virus ~/programs/VIRTUS/workflow/salmon_index_NC_001526.4_HPV16 \
    --salmon_quantdir_virus salmon_quantdir_NC_001526.4_HPV16
    --nthreads=40

But got the following error:

~/programs/VIRTUS/wrapper/VIRTUS_wrapper.py input.fastq.csv \
>     --fastq \
>     --VIRTUSDir ~/programs/VIRTUS2/workflow/ \
>     -s1 _R1_1.fastq.gz \
>     -s2 _R2_2.fastq.gz \
>     --genomeDir_human ~/programs/VIRTUS/workflow/STAR_index_human \
>     --genomeDir_virus ~/programs/VIRTUS/workflow/STAR_index_NC_001526.4_HPV16 \
>     --salmon_index_virus ~/programs/VIRTUS/workflow/salmon_index_NC_001526.4_HPV16 \
>     --salmon_quantdir_virus salmon_quantdir_NC_001526.4_HPV16
usage: VIRTUS_wrapper.py [-h] [--VIRTUSDir VIRTUSDIR] --genomeDir_human GENOMEDIR_HUMAN --genomeDir_virus
                         GENOMEDIR_VIRUS --salmon_index_human SALMON_INDEX_HUMAN
                         [--salmon_quantdir_human SALMON_QUANTDIR_HUMAN]
                         [--outFileNamePrefix_human OUTFILENAMEPREFIX_HUMAN] [--nthreads NTHREADS]
                         [--hit_cutoff HIT_CUTOFF] [-s SUFFIX_SE] [-s1 SUFFIX_PE_1] [-s2 SUFFIX_PE_2] [--fastq]
                         input_path
VIRTUS_wrapper.py: error: the following arguments are required: --salmon_index_human

I haven't even generated a salmon_index_human, as I was using the human STAR_index I previously made for VIRTUS2. Can you point me in the right direction? It also seemed like it didn't... have an error with --salmon_index_virus or --salmon_quantdir_virus despite those not being listed supported arguments? Thank you for your help.

Hi, unfortunately, the wrapper does not support single-virus mode.