Problem with vcfallelicprimitives during preprocess
Closed this issue · 17 comments
Hi all,
I have tried to run gimble after installing with conda but I encountered a problem during gimbleprep:
0%|█████████████████████████| 20/20 [2:24:56<00:00, 438.31s/it]^M[%] Inferring callable regions...: 100%|█████████████████████████| 20/20 [2:24:56<00:00, 434.81s/it]
[X] The command following command failed with error code 255:
[X] => bcftools norm --threads 1 -Ov -f /share/tycho_poolz1/pagagnaire/COGEDIV/pbarry/Cjuli/Reference_Genome/referencegenome_Cjuli.fa /share/tycho_poolz1/pagagnaire/COGEDIV/pbarry/VCF_FILES/Cjuli_filtered_pass.vcf.gz | vcfallelicprimitives --keep-info --keep-geno -t decomposed | bcftools plugin fill-AN-AC --threads 1 -Oz | bcftools filter --threads 1 -Oz -s Qual -m+ -e 'QUAL<1' | bcftools filter --threads 1 -Oz -m+ -s+ --SnpGap 2 | bcftools filter --threads 1 -Oz -e 'TYPE!="snp"' -s NonSnp -m+ > tmp_gimble_icf8fbdl/vcf.filtered.vcf.gz
[X] (STDERR): 'Failed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type'
After few tries, I found that the problem of this command line is with vcfallelicprimitives., because when I run separately, I have the "Illegal instruction" response with my VCF. I guess that it is because I am using a VCF generated by GATK that is not a freebayes VCF, but I am not sure. Do you have any ideas how to solve this ?
Thanks for the help,
Pierre
Hi @pierrebarry,
Thanks for reaching out. Is it possible to run freebayes instead and verify whether this is the issue? I have no idea what the differences are between the vcfs generate by these two packages. @DRL might have more valuable insights on this topic.
On a more general note, this seems to be yet another reason to move away from using vcf as input and have people provide sgkit arrays instead.
Hi @pierrebarry
what happens if you execute the following
bcftools norm --threads 1 -Ov -f /share/tycho_poolz1/pagagnaire/COGEDIV/pbarry/Cjuli/Reference_Genome/referencegenome_Cjuli.fa /share/tycho_poolz1/pagagnaire/COGEDIV/pbarry/VCF_FILES/Cjuli_filtered_pass.vcf.gz | head -300 | vcfallelicprimitives --keep-info --keep-geno -t decomposed > test.vcf
What is the error that occurs?
I have tested gimbleprep
in the past using GATK VCF files and they used to work.
Hi,
Thanks for the quick replies. I could try to run a quick freebayes run to see if it changes the outcome. I ran the previous line, but I had the same result : Illegal instruction (core dumped) while running vcfallelicprimitives.
If you can partition the first 10K lines of your VCF and send it I can check whether the same happens on my machine.
Another question... you have bgziped and indexed the VCF, right?
Yes compressed with bgzip -c >
(result of command-line file
: gzip compressed data, extra field, original size 0
and indexed with tabix -p vcf
.
I have the vcf with the header and the first 10K lines, what is the best way for you to send it ?
Can you attach it to a message?
There is also some test data here which you can check to make sure that the error is not due to problems with your setup...
test_gimble.vcf.gz
test_gimble.vcf.gz.tbi.gz
Here are the vcf and the index both compressed. I will test with the example data.
OK, I ran the test command and I got the same error :
[X] The command following command failed with error code 255:
[X] => bcftools norm --threads 1 -Ov -f /home/pbarry/gimble_conda_run/test/gIMble/example_input/reference.fa /home/pbarry/gimble_conda_run/test/gIMble/example_input/heliconius.test_data.vcf.gz | vcfallelicprimitives --keep-info --keep-geno -t decomposed | bcftools plugin fill-AN-AC --threads 1 -Oz | bcftools filter --threads 1 -Oz -s Qual -m+ -e 'QUAL<1' | bcftools filter --threads 1 -Oz -s Balance -m+ -e 'RPL<1 | RPR<1 | SAF<1 | SAR<1' | bcftools filter --threads 1 -Oz -m+ -s+ --SnpGap 2 | bcftools filter --threads 1 -Oz -e 'TYPE!="snp"' -s NonSnp -m+ > tmp_gimble_yp0631wc/vcf.filtered.vcf.gz
[X] (STDERR): 'Failed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type'
So it might come from the installation. I have created 2 separate conda environments one for gimble and one for gimble_prep on the distant server and I installed gimble_prep as you said in the README :
conda create -n gimble_prep conda install -c bioconda gimbleprep
.
Ok, so it's most likely the setup ... which version of bcftools
and vcflib
do you have in that conda env?
You can check that with the command conda list
.
bcftools 1.18 h8b25389_0 bioconda
vcflib 1.0.9 h146fbdb_3 bioconda
I put you all the results of the conda list
command if you need it:
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
bcftools 1.18 h8b25389_0 bioconda
bedtools 2.31.1 hf5e1c6e_0 bioconda
bzip2 1.0.8 hd590300_5 conda-forge
c-ares 1.21.0 hd590300_0 conda-forge
ca-certificates 2023.7.22 hbcca054_0 conda-forge
cmake 3.27.6 hcfe8598_0 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
docopt 0.6.2 py_1 conda-forge
eigen 3.2 3 bioconda
gimbleprep 0.0.2b5 pyhdfd78af_0 bioconda
gsl 2.7 he838d99_0 conda-forge
htslib 1.18 h81da01d_0 bioconda
jsoncpp 1.9.5 h4bd325d_1 conda-forge
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.2 h659d440_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libblas 3.9.0 19_linux64_openblas conda-forge
libcblas 3.9.0 19_linux64_openblas conda-forge
libcurl 8.4.0 hca28451_0 conda-forge
libdeflate 1.18 h0b41bf4_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libexpat 2.5.0 hcb278e6_1 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.2.0 h807b86a_3 conda-forge
libgfortran-ng 13.2.0 h69a702a_3 conda-forge
libgfortran5 13.2.0 ha4646dd_3 conda-forge
libgomp 13.2.0 h807b86a_3 conda-forge
liblapack 3.9.0 19_linux64_openblas conda-forge
libnghttp2 1.58.0 h47da74e_0 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libopenblas 0.3.24 pthreads_h413a1c8_0 conda-forge
libsqlite 3.44.0 h2797004_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 13.2.0 h7e041cc_3 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libuv 1.46.0 hd590300_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
mosdepth 0.3.2 h03814e9_1 bioconda
ncurses 6.4 h59595ed_2 conda-forge
numpy 1.26.0 py310hb13e2d6_0 conda-forge
openssl 3.1.4 hd590300_0 conda-forge
pandas 2.1.3 py310hcc13569_0 conda-forge
parallel 20160622 1 bioconda
pcre 8.45 h9c3ff4c_0 conda-forge
perl 5.32.1 4_hd590300_perl5 conda-forge
perl-threaded 5.32.1 hdfd78af_1 bioconda
pip 23.3.1 pyhd8ed1ab_0 conda-forge
pysam 0.22.0 py310h41dec4a_0 bioconda
python 3.10.13 hd12c33a_0_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-tzdata 2023.3 pyhd8ed1ab_0 conda-forge
python_abi 3.10 4_cp310 conda-forge
pytz 2023.3.post1 pyhd8ed1ab_0 conda-forge
readline 8.2 h8228510_1 conda-forge
rhash 1.4.4 hd590300_0 conda-forge
samtools 1.18 h50ea8bc_1 bioconda
setuptools 68.2.2 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
tabixpp 1.1.2 hd68fcf3_1 bioconda
tk 8.6.13 noxft_h4845f30_101 conda-forge
tqdm 4.66.1 pyhd8ed1ab_0 conda-forge
tzdata 2023c h71feb2d_0 conda-forge
vcflib 1.0.9 h146fbdb_3 bioconda
wfa2-lib 2.3.4 h4ac6f70_0 bioconda
wheel 0.41.3 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zlib 1.2.13 hd590300_5 conda-forge
zstd 1.5.5 hfc55251_0 conda-forge
Ok, so the issue seems to arise due to vcflib
being unhappy with tabixpp
if the versions are not adequate ...
A solution which seems to work on my machine with your VCF is the following (which will overwrite your existing gimbleprep
conda env):
conda create -n gimbleprep -y
conda activate gimbleprep
conda install -c bioconda vcflib=1.0.3 tabixpp=1.1.0 gimbleprep
If you have mamba
installed it might be quicker to this with mamba
instead of conda
(by replacing conda
with mamba
in the commands above) ...
Can you confirm that this works for you?
I installed the conda environment for gimbleprep the way you proposed and it worked for the heliconius example data and for my data, and, switching to the main conda environment of gimble, gimble parse
works perfectly well on my data so the problem is solved ! Many thanks for the help !
Awesome! Does this require modification of setup.py
to avoid this happening to other people as well? Do we have to pin down versions of both tabixpp
and vcflib
?
Yes, it seems that this solves the issue.
vcflib=1.0.3
tabixpp=1.1.0
vcflib=1.0.3
alone did not work on my machine.
The vcflib
conda recipe does not specify a tabixpp
version. In fact, tabixpp
is just a c++ wrapper they've developed themselves. I would find it therefore highly surprising that the latest version of vcflib
would not be compatible with tabixpp
. @pierrebarry mentioned that he indexed the vcf file himself after compressing using tabix
. Is it possible that the there is a version mismatch between the output of the version of tabix
you both have on your machine and the requirements of tabixpp
?