Error with vcfMissingness.pl
btmartin721 opened this issue · 7 comments
Hello,
I am having an issue with vcfMissingness.pl. Specifically, after running for almost exactly 2 hours 40 minutes, I get the below error with one of the R scripts that is called from the perl script. vcfMissingness.pl generates the *.012, *.012.indv, *.012.pos, *.missingness files, and minimum/max pairwise missingness values fine, but then gives the below error when it gets to: Parsing "c88.vcf" ...
It says that for the first vcf file then it crashes.
At first, I thought there was an issue with the R script recognizing the path to my VCF files. I was specifying the correct paths to them in my vcflist.txt file (e.g., /home/USER/clustOpt/vcf/c88.vcf) with one file per line. I then tried copying the VCF files to both the clustOpt and clustOpt/resources directories and tried running it again but got the same error. For some reason it's not reading the VCF files correctly. The VCF files were all made with ipyrad v. 0.7.30.
The ipyrad VCF file is a little different than the original pyRAD's. clustOpt is compatible with ipyrad's VCF files, right?
If so, do you have any insight as to what might be wrong here? All the dependencies that are listed in your README.md are installed as far as I know.
Thanks for your time.
Here's the error:
Read 8 items
Error in snpgdsVCF2GDS(vcfFiles[i], gdsOut) :
LENGTH or similar applied to NULL object
Execution halted
Error in snpgdsVCF2GDS(vcfFiles[i], gdsOut, verbose = F) :
LENGTH or similar applied to NULL object
Execution halted
Error in snpgdsVCF2GDS(vcfFiles[i], gdsOut, verbose = F) :
LENGTH or similar applied to NULL object
Execution halted
sending incremental file list
Hi Bradley,
That is an error with SNPRelate (not clustOpt). What version of SNPRelate are you using?
Hi. Thank you for your reply. I am using SNPRelate_1.6.6 in R 3.5.1 on CentOS 6.
I installed most of the dependencies, including SNPRelate, throught conda. I am working on a HPC cluster with CentOS 6 and don't have root privileges. I have found that using conda to install R packages is much easier than doing so through R directly. Using install.packages() often leads to dependency errors. However, it's certainly possible that conda may have installed a different version of SNPRelate than what clustOpt was tested with and/or that it created issues.
-Bradley
This appears to be a known issue of SNPRelate v1.6. The author says it was fixed with the Dec 2017 release (v1.12.1) (see the release notes here: https://github.com/zhengxwen/SNPRelate/releases/tag/v1.12.1).
I recommend upgrading SNPRelate and seeing if the problem persists.
Hi, I tried using another version of SNPRelate: SNPRelate_1.16.0
This time I installed it and all dependencies through R directly, and not through conda. After running clustOpt again, it got farther but I received a different error:
Error in is.matrix(dist) | inherits(dist, "snpgdsDissClass") | inherits(dist, :
There is no SNP!
Just above the error, this line was printed to STDOUT: Excluding 43,829 SNPs on non-autosomes
That's how many SNPs are in the whole alignment, so for some reason it filtered out all of them as non-autosomal. Do you have any idea why it would have done that? I'm trying to determine if it's an issue with my dataset, if it's dependency-related, or something else.
Thanks for your time.
-Bradley
Detailed STDOUT:
VCF Format ==> SNP GDS Format
Method: exacting biallelic SNPs
Number of samples: 250
Parsing "/home/btm002/clustOpt/original_vcf/c70.vcf" ...
import 43829 variants.
- genotype { Bit2 250x43829, 2.6M } *
Optimize the access efficiency ...
Clean up the fragments of GDS file:
open the file '/home/btm002/clustOpt/original_vcf/c70_missHM012.gds' (2.7M)of fragments: 50
save to '/home/btm002/clustOpt/original_vcf/c70_missHM012.gds.tmp'
rename '/home/btm002/clustOpt/original_vcf/c70_missHM012.gds.tmp' (2.7M, reduced: 360B)of fragments: 20
Identity-By-State (IBS) analysis on genotypes:
Excluding 43,829 SNPs on non-autosomes
Error in is.matrix(dist) | inherits(dist, "snpgdsDissClass") | inherits(dist, :
There is no SNP!
Calls: snpgdsHCluster ... tryCatch -> tryCatchList -> tryCatchOne ->
Execution halted
Ah yes, this is another little peculiarity of SNPRelate. Some versions assume you are working with human chromosomes and will by default ignore scaffolds not named 1-22. You can change this by adding autosome.only=F to the offending SNPRelate calls.
Specifically, line 30 in ibdSlope.R should be changed to read:
ibsMat <- snpgdsIBS(genofile, num.thread=2, verbose=F, autosome.only=F)
And line 141 in resources/missingnessHeatMaps.R should be changed to read:
ibsInter <- snpgdsHCluster(snpgdsIBS(gdsInter, num.thread=2, autosome.only=F))
Thank you. I will run it again with the updated arguments.
Got it to work. Thanks for all your help. A few notes:
-
After your last comment, I added autosome.only=F and it got further. Another error occurred, but it was a data-related issue. I realized that a few of my individuals had almost all missing data (N=3 to N=15 good SNPs) and it didn't like that. So I removed individuals with >90% missing data, re-ran it, and it worked fine.
-
In vcfToPCAvarExplained.R I also had to add autosome.only=F to the snpgdsPCA method call.
And voila! I really like your program. Thanks for writing/publishing it.
Best,
-Bradley