skandlab/SMuRF

error at "extracting meta data from VRanges"

Closed this issue · 5 comments

Hi,

I am testing SMuRF on a set of files I generated running the individual callers. I am getting an "Error in normalizeDoubleBracketSubscript". It seems that the expected data type is not there. Are there specific requirements for the input vcfs ?

Thanks
Gianfilippo

Below is my command line and the output
myresults = smurf(directory = "Variants_hg38_BWA_ensemble/Sample_G1700T_012",mode="combined",nthreads=20,output.dir="Variants_hg38_BWA_ensemble/Sample_G1700T_012",build="hg38",check.packages=T)
[1] "SMuRFv1.6 (3rd Oct 2019)"
[1] "Saving output files to: Variants_hg38_BWA_ensemble/Sample_G1700T_012"
Connection successful!

R is connected to the H2O cluster:
H2O cluster version: 3.26.0.2
H2O cluster version age: 2 months and 25 days
H2O cluster total nodes: 1
H2O cluster total memory: 26.63 GB
H2O cluster total cores: 20
H2O cluster allowed cores: 1
H2O cluster healthy: TRUE
H2O API Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
R Version: R version 3.5.0 (2018-04-23)

Accessing files:
Variants_hg38_BWA_ensemble/Sample_G1700T_012/mutect2.vcf.gz
Variants_hg38_BWA_ensemble/Sample_G1700T_012/freebayes.vcf.gz
Variants_hg38_BWA_ensemble/Sample_G1700T_012/varscan.vcf.gz
Variants_hg38_BWA_ensemble/Sample_G1700T_012/vardict.vcf.gz
[1] "Parsing step"
[1] "reading vcfs"
[1] "reading mutect2"
[1] "reading freebayes"
[1] "reading varscan"
[1] "reading vardict"
Time difference of 16.48991 secs
[1] "extracting calls passed by at least 1 caller"
Time difference of 0.82076 secs
[1] "extracting meta data from VRanges"
Error in normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, :
invalid [[ subscript type: NULL

Hi,

thanks.

I can see the sample names (tumor and normal) in each of the 4 files (I have Mutect2, VarSvan2, VarDict, freebayes). They all are from the same sample, but I can see each has a different name. I guess I have to fix that.
Also, VarScan used the whole file path as sample names.
And in my freebayes vcf I can see an extra column that you do not have in your sample freebayes vcf. How do I get rid of it ?

Thanks

Hi,

I just edited the freebayes vcf and made sure all samples names in the various vcfs are consistent (see below). I am still getting the exact same error.

Do you have any other thought ?

Thanks

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample_G1700T_012 Sample_G1700N_006
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample_G1700N_006 Sample_G1700T_012
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample_G1700T_012 Sample_G1700N_006
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample_G1700N_006 Sample_G1700T_012

Resolving error:
Error in normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, :
invalid [[ subscript type: NULL

Cause:
vcf sample names for tumour and normal files not detected automatically.

Solution:
Manually state your tumor file tag.
Example:
t.label='-T
t.label='tumor'
t.label='T_001'
t.label='T
' #also works for you

Error message:
't.label for tumor sample is not unique, duplicated or missing'

myresults = smurf(directory = "Variants_hg38_BWA_ensemble/Sample_G1700T_012",
mode="combined",
t.label='T_012',
nthreads=20,
output.dir="Variants_hg38_BWA_ensemble/Sample_G1700T_012",
build="hg38",
check.packages=T)

Please download the latest patch SMuRF-v1.6.2. Thanks!

thanks!!
I will try this