broadinstitute/Drop-seq

Two problems and their solutions

dconrad opened this issue · 1 comments

Hi -I've been using CensusSeq today and spent 30-60 min working out a couple of issues before I could get it to run correctly. But it ran well in the end and worked great! I just wanted to share my experiences here in case they may help someone else. The background is that I am using CensusSeq to estimate the mixture proportions from a mixture of two genomes. so the BAM file contains reads from only two different cell lines/donors. First, I believe this is a bit off from the intended use case, which expects more than 2 genomes in the mixture, and the default site filters cause the run to fail. Specifically the error:
ERROR 2021-09-08 22:06:30 CensusSeq 0 SNPs found. Something is very wrong!

This can be fixed by setting the option MIN_NUM_VARIANT_SAMPLES=1.

The second issue is a little trickier to sort out. Here I received an error of

java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2063)
at org.broadinstitute.dropseqrna.censusseq.CensusSeqUtils.getTempVCFFile(CensusSeqUtils.java:130)
at org.broadinstitute.dropseqrna.censusseq.CensusSeq.doWork(CensusSeq.java:171)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)

From reading some of the other closed issues for the Drop-Seq repo, I found that there is an option called TMP_DIR that is not mentioned by ConsensusSeq —help. If you define this option, e.g. TMP_DIR=./ this resolves the problem. however, this does create a tmp vcf file and a corresponding .tbi. Just FYI for the developers - the vcf is deleted after the job is complete, but the .tbi is not deleted.

I’m running this on a local linux machine at my school. The full output for each issue is pasted below.

Cheers

Don

ERROR MEESSAGE #1(MIN_NUM_VARIANT_SAMPLES ISSUE)

[Wed Sep 08 22:05:46 PDT 2021] CensusSeq INPUT_BAM=[../../../alex/marmosets/output/marm007/marm007.mkdup.sort.bam] INPUT_VCF=mcc.full.rehead.filt.id.vcf.gz SAMPLE_FILE=census_seq.samps OUTPUT=my.census.out GQ_THRESHOLD=30 FRACTION_SAMPLES_PASSING=0.9 MIN_NUM_VARIANT_SAMPLES=2 IGNORED_CHROMOSOMES=[X, Y, MT] READ_MQ=10 MIN_BASE_QUALITY=10 REPORT_ALLELE_COUNTS=false NUM_THREADS=1 RANDOM_SEED=1 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Sep 08 22:05:46 PDT 2021] Executing as conradon@monkeydo on Linux 3.10.0-1160.25.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Jdk; Inflater: Jdk; Provider GCS is not available; Picard version: 2.4.1(92c1eb2_1628103202)
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
INFO 2021-09-08 22:05:46 CensusSeq Number of contigs in common: 964.

INFO 2021-09-08 22:05:46 CensusSeq Genotype Quality [GQ] not found in header. Disabling GQ_THRESHOLD parameter
INFO 2021-09-08 22:05:46 CensusSeqUtils Found 2 samples in VCF and requested sample list out of 2 requested
java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2063)
at org.broadinstitute.dropseqrna.censusseq.CensusSeqUtils.getTempVCFFile(CensusSeqUtils.java:130)
at org.broadinstitute.dropseqrna.censusseq.CensusSeq.doWork(CensusSeq.java:171)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)
INFO 2021-09-08 22:05:46 CensusSeq Looking through VCF for SNPs that fit criteria. Will search for these in BAM.
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
INFO 2021-09-08 22:05:46 CensusSeq Searching for variants with at least [2] samples with the non-ref genotype
22:05:46 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
INFO 2021-09-08 22:05:46 CensusSeq Found 2 samples in VCF and requested sample list out of 2 requested
INFO 2021-09-08 22:05:46 CensusSeq Genotype Quality Filter disabled. Enabling A/T, C/G SNP Filter to eliminate potential allele flipping variants
INFO 2021-09-08 22:05:57 CensusSeq Processed 1,000,000 records. Elapsed time: 00:00:11s. Time for last 1,000,000: 11s. Last read position: chr2:18,508,036
INFO 2021-09-08 22:06:07 CensusSeq Processed 2,000,000 records. Elapsed time: 00:00:21s. Time for last 1,000,000: 10s. Last read position: chr3:51,809,119
INFO 2021-09-08 22:06:17 CensusSeq Processed 3,000,000 records. Elapsed time: 00:00:31s. Time for last 1,000,000: 10s. Last read position: chr18:3,558,080
INFO 2021-09-08 22:06:28 CensusSeq Processed 4,000,000 records. Elapsed time: 00:00:41s. Time for last 1,000,000: 10s. Last read position: chr22:4,197,148
INFO 2021-09-08 22:06:30 CensusSeq Scanning VCF to find potential SNP sites
INFO 2021-09-08 22:06:30 CensusSeq Found [0] potential SNP sites to query.
ERROR 2021-09-08 22:06:30 CensusSeq 0 SNPs found. Something is very wrong!
[Wed Sep 08 22:06:30 PDT 2021] org.broadinstitute.dropseqrna.censusseq.CensusSeq done. Elapsed time: 0.74 minutes.
Runtime.totalMemory()=1773666304

ERROR MESSAGE #2 (TMP DIR ISSUE)
[Wed Sep 08 22:16:21 PDT 2021] CensusSeq INPUT_BAM=[../../../alex/marmosets/output/marm007/marm007.mkdup.sort.bam] INPUT_VCF=mcc.full.rehead.filt.id.vcf.gz SAMPLE_FILE=census_seq.samps OUTPUT=my.census.out MIN_NUM_VARIANT_SAMPLES=1 GQ_THRESHOLD=30 FRACTION_SAMPLES_PASSING=0.9 IGNORED_CHROMOSOMES=[X, Y, MT] READ_MQ=10 MIN_BASE_QUALITY=10 REPORT_ALLELE_COUNTS=false NUM_THREADS=1 RANDOM_SEED=1 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Sep 08 22:16:21 PDT 2021] Executing as conradon@monkeydo on Linux 3.10.0-1160.25.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Jdk; Inflater: Jdk; Provider GCS is not available; Picard version: 2.4.1(92c1eb2_1628103202)
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
INFO 2021-09-08 22:16:21 CensusSeq Number of contigs in common: 964.

INFO 2021-09-08 22:16:21 CensusSeq Genotype Quality [GQ] not found in header. Disabling GQ_THRESHOLD parameter
INFO 2021-09-08 22:16:21 CensusSeqUtils Found 2 samples in VCF and requested sample list out of 2 requested
java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2063)
at org.broadinstitute.dropseqrna.censusseq.CensusSeqUtils.getTempVCFFile(CensusSeqUtils.java:130)
at org.broadinstitute.dropseqrna.censusseq.CensusSeq.doWork(CensusSeq.java:171)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)
INFO 2021-09-08 22:16:21 CensusSeq Looking through VCF for SNPs that fit criteria. Will search for these in BAM.
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
INFO 2021-09-08 22:16:21 CensusSeq Searching for variants with at least [1] samples with the non-ref genotype
22:16:21 [main] WARN com.intel.gkl.compression.IntelDeflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
INFO 2021-09-08 22:16:21 CensusSeq Found 2 samples in VCF and requested sample list out of 2 requested
INFO 2021-09-08 22:16:21 CensusSeq Genotype Quality Filter disabled. Enabling A/T, C/G SNP Filter to eliminate potential allele flipping variants
INFO 2021-09-08 22:16:21 CensusSeq Scanning VCF to find potential SNP sites
[Wed Sep 08 22:16:21 PDT 2021] org.broadinstitute.dropseqrna.censusseq.CensusSeq done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2058354688
Exception in thread "main" java.lang.NullPointerException
at org.broadinstitute.dropseqrna.vcftools.SampleAssignmentVCFUtils.getSNPIntervals(SampleAssignmentVCFUtils.java:187)
at org.broadinstitute.dropseqrna.censusseq.CensusSeq.doWork(CensusSeq.java:182)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:42)

alecw commented

Hi Don,

Thanks for this. Note that documentation of TMP_DIR can be viewed with --stdhelp command-line option (but not -H option despite what the help message says [sigh]).

Regards, Alec