arq5x/bedtools2

non-deterministic error, VCF file line contains an unexpected number of fields

williambrandler opened this issue · 1 comments

I am running bedtools on a vcf located in amazon web services S3,
accessing it as if it was on the local filesystem via the databricks file system mount ("/dbfs/mnt/")

I have never had issues with bedtools doing this before, but I am now hitting the following error:

Error: line number 98 of file /dbfs/mnt/test.vcf has 14 fields, but 11 were expected.

If I run it again, I get the same error but on a different line

Have you seen this issue before? Any idea what could cause it?
There is no problem with these lines in the VCF

code:

%sh
input_vcf_local_path=/dbfs/mnt/test.vcf
bedtools intersect -seed 24 -a $input_vcf_local_path -b $bed_local_path -header -wa > $bedtools_filter_vcf_local_path

The issue was $input_vcf_local_path = $bedtools_filter_vcf_local_path

so bedtools was writing to the same output file as the input