BIMSBbioinfo/janggu

BEDTools Installation

Closed this issue ยท 7 comments

I am new to using Janggu and was try to read in .bed files. As mentioned I had also installed the BEDTool in the same virtual environment as Janggu and I am still getting this error below. Please help.

Traceback (most recent call last):
File "test_bed.py", line 15, in
store_whole_genome=True)
File "/gpfs/data/ahsan-lab/Sameep/janggu-gpu/lib/python3.7/site-packages/janggu/data/coverage.py", line 1163, in create_from_bed
verbose=verbose)
File "/gpfs/data/ahsan-lab/Sameep/janggu-gpu/lib/python3.7/site-packages/janggu/data/genomicarray.py", line 1191, in create_genomic_array
verbose=verbose)
File "/gpfs/data/ahsan-lab/Sameep/janggu-gpu/lib/python3.7/site-packages/janggu/data/genomicarray.py", line 640, in init
gsize_ = gsize() if callable(gsize) else gsize
File "/gpfs/data/ahsan-lab/Sameep/janggu-gpu/lib/python3.7/site-packages/janggu/data/coverage.py", line 105, in call
return self.gsize
File "/gpfs/data/ahsan-lab/Sameep/janggu-gpu/lib/python3.7/site-packages/janggu/data/coverage.py", line 94, in gsize
self.load_gsize()
File "/gpfs/data/ahsan-lab/Sameep/janggu-gpu/lib/python3.7/site-packages/janggu/data/coverage.py", line 72, in load_gsize
bed = BedTool(bedfile).sort().merge()
File "/gpfs/data/ahsan-lab/Sameep/janggu-gpu/lib/python3.7/site-packages/pybedtools/bedtool.py", line 917, in decorated
result = method(self, *args, **kwargs)
File "/gpfs/data/ahsan-lab/Sameep/janggu-gpu/lib/python3.7/site-packages/pybedtools/bedtool.py", line 240, in not_implemented_func
raise NotImplementedError(help_str)
NotImplementedError: "sortBed" does not appear to be installed or on the path, so this method is disabled. Please install a more recent version of BEDTools and re-import to use this method.

wkopp commented

Hi @Liquidten ,

It seems that bedtools can't be found. bedtools isn't a python package, so it needs to be installed separately.
You can install bedtools e.g. using conda with conda install -c bioconda bedtools or by apt install bedtools on ubuntu.
After bedtools is installed, you should be able to find the bedtools program with which bedtools on the command line.

Also, after bedtools is installed, you should be able to do:

from pybedtools import BedTool

bed=BedTool("test.bed")

within python with an arbitrary test.bed file of your choice.

Best,
Wolfgang

Thanks for the help. Quick question can I read in binary .bed files extracted from plink using janggu to further apply machine learning on it? If so could you be able to show me how. I tried going through https://janggu.readthedocs.io/en/latest/ documentation but couldn't figure out. Thank you for the help.

wkopp commented

No, I don't think binary files are supported by bedtools.

But you could check with:

from pybedtools import BedTool

bed=BedTool("test.bed")
bed[0]

if this works, it should also work within janggu.

Hi,

I might be able to clear some things up here. Plink .bed files are a format used to store genotype data. Janggu supports UCSC bed files (https://genome.ucsc.edu/FAQ/FAQformat.html#format1), which store genomic regions. If you want to look at the coordinates of your genotypes, you could convert the plink bim (every plink bed file has a corresponding bim and fam file) file to UCSC bed. If you want to use plink bed files for variant effect prediction, you need to first convert them to vcf. The plink program has a command to do this.

Hope this is helpful.

Cheers,

Remo

Thanks Remo, I am trying to use this GWAS data and possibly include the demographic data to predict a binary phenotype output. Any suggestions on how to approach this and if janggu is useful for this. Really appreciate the feed back.

Thanks

Sam

If you simply want to load genotype data to make predictions you may want to also look at packages like pysnptools or hail.

I'l check them out. Thanks for the help