LohseLab/gimbleprep

Problem with vcfallelicprimitives during preprocess

Closed this issue · 17 comments

Hi all,

I have tried to run gimble after installing with conda but I encountered a problem during gimbleprep:

0%|█████████████████████████| 20/20 [2:24:56<00:00, 438.31s/it]^M[%] Inferring callable regions...: 100%|█████████████████████████| 20/20 [2:24:56<00:00, 434.81s/it]
[X] The command following command failed with error code 255:
[X] => bcftools norm --threads 1 -Ov -f /share/tycho_poolz1/pagagnaire/COGEDIV/pbarry/Cjuli/Reference_Genome/referencegenome_Cjuli.fa /share/tycho_poolz1/pagagnaire/COGEDIV/pbarry/VCF_FILES/Cjuli_filtered_pass.vcf.gz |                 vcfallelicprimitives --keep-info --keep-geno -t decomposed |                 bcftools plugin fill-AN-AC --threads 1 -Oz |                 bcftools filter --threads 1 -Oz -s Qual -m+ -e 'QUAL<1' |                 bcftools filter --threads 1 -Oz -m+ -s+ --SnpGap 2 |                 bcftools filter --threads 1 -Oz -e 'TYPE!="snp"' -s NonSnp -m+ > tmp_gimble_icf8fbdl/vcf.filtered.vcf.gz
[X] (STDERR): 'Failed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type'

After few tries, I found that the problem of this command line is with vcfallelicprimitives., because when I run separately, I have the "Illegal instruction" response with my VCF. I guess that it is because I am using a VCF generated by GATK that is not a freebayes VCF, but I am not sure. Do you have any ideas how to solve this ?

Thanks for the help,

Pierre

Hi @pierrebarry,
Thanks for reaching out. Is it possible to run freebayes instead and verify whether this is the issue? I have no idea what the differences are between the vcfs generate by these two packages. @DRL might have more valuable insights on this topic.
On a more general note, this seems to be yet another reason to move away from using vcf as input and have people provide sgkit arrays instead.

DRL commented

Hi @pierrebarry

what happens if you execute the following

bcftools norm --threads 1 -Ov -f /share/tycho_poolz1/pagagnaire/COGEDIV/pbarry/Cjuli/Reference_Genome/referencegenome_Cjuli.fa /share/tycho_poolz1/pagagnaire/COGEDIV/pbarry/VCF_FILES/Cjuli_filtered_pass.vcf.gz |  head -300 | vcfallelicprimitives --keep-info --keep-geno -t decomposed > test.vcf

What is the error that occurs?

I have tested gimbleprepin the past using GATK VCF files and they used to work.

Hi,

Thanks for the quick replies. I could try to run a quick freebayes run to see if it changes the outcome. I ran the previous line, but I had the same result : Illegal instruction (core dumped) while running vcfallelicprimitives.

DRL commented

If you can partition the first 10K lines of your VCF and send it I can check whether the same happens on my machine.

DRL commented

Another question... you have bgziped and indexed the VCF, right?

Yes compressed with bgzip -c > (result of command-line file : gzip compressed data, extra field, original size 0 and indexed with tabix -p vcf .

I have the vcf with the header and the first 10K lines, what is the best way for you to send it ?

DRL commented

Can you attach it to a message?

DRL commented

There is also some test data here which you can check to make sure that the error is not due to problems with your setup...

test_gimble.vcf.gz
test_gimble.vcf.gz.tbi.gz

Here are the vcf and the index both compressed. I will test with the example data.

OK, I ran the test command and I got the same error :

[X] The command following command failed with error code 255:
[X] => bcftools norm --threads 1 -Ov -f /home/pbarry/gimble_conda_run/test/gIMble/example_input/reference.fa /home/pbarry/gimble_conda_run/test/gIMble/example_input/heliconius.test_data.vcf.gz |                 vcfallelicprimitives --keep-info --keep-geno -t decomposed |                 bcftools plugin fill-AN-AC --threads 1 -Oz |                 bcftools filter --threads 1 -Oz -s Qual -m+ -e 'QUAL<1' |                 bcftools filter --threads 1 -Oz -s Balance -m+ -e 'RPL<1 | RPR<1 | SAF<1 | SAR<1' |                 bcftools filter --threads 1 -Oz -m+ -s+ --SnpGap 2 |                 bcftools filter --threads 1 -Oz -e 'TYPE!="snp"' -s NonSnp -m+ > tmp_gimble_yp0631wc/vcf.filtered.vcf.gz
[X] (STDERR): 'Failed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type\nFailed to read from standard input: unknown file type'

So it might come from the installation. I have created 2 separate conda environments one for gimble and one for gimble_prep on the distant server and I installed gimble_prep as you said in the README :

conda create -n gimble_prep conda install -c bioconda gimbleprep.

DRL commented

Ok, so it's most likely the setup ... which version of bcftools and vcflib do you have in that conda env?

You can check that with the command conda list.

bcftools                  1.18                 h8b25389_0    bioconda
vcflib                    1.0.9                h146fbdb_3    bioconda

I put you all the results of the conda list command if you need it:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
bcftools                  1.18                 h8b25389_0    bioconda
bedtools                  2.31.1               hf5e1c6e_0    bioconda
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.21.0               hd590300_0    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
cmake                     3.27.6               hcfe8598_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
docopt                    0.6.2                      py_1    conda-forge
eigen                     3.2                           3    bioconda
gimbleprep                0.0.2b5            pyhdfd78af_0    bioconda
gsl                       2.7                  he838d99_0    conda-forge
htslib                    1.18                 h81da01d_0    bioconda
jsoncpp                   1.9.5                h4bd325d_1    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libblas                   3.9.0           19_linux64_openblas    conda-forge
libcblas                  3.9.0           19_linux64_openblas    conda-forge
libcurl                   8.4.0                hca28451_0    conda-forge
libdeflate                1.18                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_3    conda-forge
libgfortran-ng            13.2.0               h69a702a_3    conda-forge
libgfortran5              13.2.0               ha4646dd_3    conda-forge
libgomp                   13.2.0               h807b86a_3    conda-forge
liblapack                 3.9.0           19_linux64_openblas    conda-forge
libnghttp2                1.58.0               h47da74e_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.24          pthreads_h413a1c8_0    conda-forge
libsqlite                 3.44.0               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_3    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.46.0               hd590300_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
mosdepth                  0.3.2                h03814e9_1    bioconda
ncurses                   6.4                  h59595ed_2    conda-forge
numpy                     1.26.0          py310hb13e2d6_0    conda-forge
openssl                   3.1.4                hd590300_0    conda-forge
pandas                    2.1.3           py310hcc13569_0    conda-forge
parallel                  20160622                      1    bioconda
pcre                      8.45                 h9c3ff4c_0    conda-forge
perl                      5.32.1          4_hd590300_perl5    conda-forge
perl-threaded             5.32.1               hdfd78af_1    bioconda
pip                       23.3.1             pyhd8ed1ab_0    conda-forge
pysam                     0.22.0          py310h41dec4a_0    bioconda
python                    3.10.13         hd12c33a_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-tzdata             2023.3             pyhd8ed1ab_0    conda-forge
python_abi                3.10                    4_cp310    conda-forge
pytz                      2023.3.post1       pyhd8ed1ab_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
rhash                     1.4.4                hd590300_0    conda-forge
samtools                  1.18                 h50ea8bc_1    bioconda
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
tabixpp                   1.1.2                hd68fcf3_1    bioconda
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tqdm                      4.66.1             pyhd8ed1ab_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
vcflib                    1.0.9                h146fbdb_3    bioconda
wfa2-lib                  2.3.4                h4ac6f70_0    bioconda
wheel                     0.41.3             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

DRL commented

Ok, so the issue seems to arise due to vcflib being unhappy with tabixpp if the versions are not adequate ...

A solution which seems to work on my machine with your VCF is the following (which will overwrite your existing gimbleprep conda env):

conda create -n gimbleprep -y
conda activate gimbleprep 
conda install -c bioconda vcflib=1.0.3 tabixpp=1.1.0 gimbleprep

If you have mamba installed it might be quicker to this with mamba instead of conda (by replacing conda with mamba in the commands above) ...

Can you confirm that this works for you?

I installed the conda environment for gimbleprep the way you proposed and it worked for the heliconius example data and for my data, and, switching to the main conda environment of gimble, gimble parse works perfectly well on my data so the problem is solved ! Many thanks for the help !

Awesome! Does this require modification of setup.py to avoid this happening to other people as well? Do we have to pin down versions of both tabixpp and vcflib?

DRL commented

Yes, it seems that this solves the issue.

vcflib=1.0.3 
tabixpp=1.1.0

vcflib=1.0.3 alone did not work on my machine.

The vcflib conda recipe does not specify a tabixpp version. In fact, tabixpp is just a c++ wrapper they've developed themselves. I would find it therefore highly surprising that the latest version of vcflib would not be compatible with tabixpp. @pierrebarry mentioned that he indexed the vcf file himself after compressing using tabix. Is it possible that the there is a version mismatch between the output of the version of tabix you both have on your machine and the requirements of tabixpp?