wgbstools segment error

Question

wgbstools segment error

Closed this issue 2 years ago · 11 comments

I'm attempting to segment the (hg19) beta files available from here:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE186458

I'm getting the following error:

[ add_loci ] Failed! exception: [wt add_loci] line 0: endCpG < startCpG [wt segment] found 74,349 blocks (dropped 106,621 short blocks) Traceback (most recent call last): File "./wgbs_tools/wgbstools", line 96, in <module> main() File "./wgbs_tools/wgbstools", line 63, in main importlib.import_module(args.command).main() File "./wgbs_tools/src/python/segment.py", line 310, in main SegmentByChunks(args, betas).run() File "./wgbs_tools/src/python/segment.py", line 147, in run self.dump_result(df.reset_index(drop=True)) File "./wgbs_tools/src/python/segment.py", line 183, in dump_result add_bed_to_cpgs(temp_path, self.genome.genome, self.args.out_path) File "./wgbs_tools/src/python/convert.py", line 245, in add_bed_to_cpgs subprocess.check_call(cmd, shell=True) File "/exports/applications/apps/SL7/anaconda/5.0.1/lib/python3.6/subprocess.py", line 291, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'cat qfp0iepq | ./wgbs_tools/src/cpg2bed/add_loci ./wgbs_tools/references/hg19/CpG.bed.gz ./wgbs_tools/references/hg19/CpG.chrome.size > ./GSE186458/GSE186458.chr20.blocks.small.bed' returned non-zero exit status 1. Segmenting

Any thoughts on what the problem might be and how to fix it?

Thanks

D

Answer 1 · 2022-10-29T07:01:04.000Z

If I attempt to use pre-generated blocks download from the same GSE repository then I get the following error:

Traceback (most recent call last): File "./wgbs_tools/wgbstools", line 96, in <module> main() File "./wgbs_tools/wgbstools", line 63, in main importlib.import_module(args.command).main() File "./wgbs_tools/src/python/beta_to_table.py", line 150, in main for chunk in chunks: File "./wgbs_tools/src/python/beta_to_table.py", line 133, in beta2table_generator yield get_table(subset_blocks, gf, min_cov, threads, verbose) File "./wgbs_tools/src/python/beta_to_table.py", line 68, in get_table is_nice, _ = is_block_file_nice(blocks_df) File "./wgbs_tools/src/python/beta_to_blocks.py", line 28, in is_block_file_nice if df[['startCpG', 'endCpG']].isna().values.sum() > 0: File "/exports/applications/apps/SL7/anaconda/5.0.1/lib/python3.6/site-packages/pandas/core/generic.py", line 3081, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'isna'

So I'm not having much luck, unfortunately.

Answer 2 · 2022-10-29T07:21:07.000Z

Hi, can you post the exact command you used? And some example head command output of your input files?
A

Answer 3 · 2022-10-29T09:57:00.000Z

Here is the command I've used for segmenting:

${WGBSTOOLS} segment --threads ${NSLOTS} --genome hg19 --betas ${IN}/*.beta --min_cpg 4 --max_bp 5000 -r chr${SGE_TASK_ID} -o ${PREFIX}.chr${SGE_TASK_ID}.blocks.small.bed

And here is an example of some chr20 data from from one of the files:

${WGBSTOOLS} view GSM5652211_Lung-Bronchus-Smooth-Muscle-Z00000421.beta | grep "^chr20" | head chr20 60178 60180 3 33 chr20 60425 60427 28 34 chr20 60431 60433 26 35 chr20 60550 60552 7 24 chr20 60577 60579 3 25 chr20 60721 60723 7 30 chr20 60806 60808 24 31 chr20 60962 60964 28 49 chr20 61069 61071 23 44 chr20 61181 61183 27 35

Edit:

For some reason github isn't liking pasting from linux. The output should be showing as a standard beta file format, but appears to be lacking carriage returns once pasted.

${WGBSTOOLS} view GSM5652211_Lung-Bronchus-Smooth-Muscle-Z00000421.beta | grep "^chr20" | head
chr20 60178 60180 3 33
chr20 60425 60427 28 34
chr20 60431 60433 26 35
chr20 60550 60552 7 24
chr20 60577 60579 3 25
chr20 60721 60723 7 30
chr20 60806 60808 24 31
chr20 60962 60964 28 49
chr20 61069 61071 23 44
chr20 61181 61183 27 35

Answer 4 · 2022-10-29T12:39:34.000Z

Just to add to that, I've tried this with just one input beta file and the error is the same. If I use awk to output any lines where field 3 is less than field 2, then no lines are reported. This indicates that none of the lines in the beta file has an end position less than the start position.

${WGBSTOOLS} view GSM5652176_Adipocytes-Z000000T7.beta | awk '{if($3<$2) print $0;}' -

Answer 5 · 2022-10-30T08:52:33.000Z

I'm not reproducing the error on my side with some random chromosomes i tried. Which chromosome? chromosome 20? all of them? I downloaded the data from the link you posted and set up a clean wgbstools and ran and did not reproduce. My best guess is that our reference/annotation data is different but I will try to see if I can find other causes.

Answer 6 · 2022-10-30T13:04:51.000Z

I've just tried following the tutorial and run into the same problem at the point of segmenting:

${WGBSTOOLS} segment --betas *beta --min_cpg 3 --max_bp 2000 -r $region -o blocks.small.bed
[wt segment] found 9 blocks
(dropped 9 short blocks)
[ add_loci ] Failed! exception:
[wt add_loci] line 0: endCpG < startCpG
Traceback (most recent call last):
File "./wgbs_tools/wgbstools", line 96, in <module>
main()
File "./wgbs_tools/wgbstools", line 63, in main
importlib.import_module(args.command).main()
File "./wgbs_tools/src/python/segment.py", line 310, in main
SegmentByChunks(args, betas).run()
File "./wgbs_tools/src/python/segment.py", line 147, in run
self.dump_result(df.reset_index(drop=True))
File "./wgbs_tools/src/python/segment.py", line 183, in dump_result
add_bed_to_cpgs(temp_path, self.genome.genome, self.args.out_path)
File "./wgbs_tools/src/python/convert.py", line 245, in add_bed_to_cpgs
subprocess.check_call(cmd, shell=True)
File "/exports/applications/apps/SL7/anaconda/5.0.1/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cat sp68njbk | ./wgbs_tools/src/cpg2bed/add_loci ./wgbs_tools/references/hg19/CpG.bed.gz ./wgbs_tools/references/hg19/CpG.chrome.size > blocks.small.bed' returned non-zero exit status 1.

So I'm assuming its a problem with my software environment. Python version 3.6.3 is loaded, along with samtools 1.13 and BEDtools2.30.0. There were no warnings when installing wgbstools:

$ python setup.py
Compiling stdin2beta...
SUCCESS
Compiling stdin2pairs...
SUCCESS
Compiling pat_sampler...
SUCCESS
Compiling patter...
SUCCESS
Compiling bp_patter...
SUCCESS
Compiling snp_patter...
SUCCESS
Compiling match_maker...
SUCCESS
Compiling segmentor...
SUCCESS
Compiling cview...
SUCCESS
Compiling homog...
SUCCESS
Compiling add_cpg_counts...
SUCCESS
Compiling add_loci...
SUCCESS

Answer 7 · 2022-10-30T13:13:30.000Z

Which pandas version are you using?

Answer 8 · 2022-10-30T14:12:08.000Z

I'm using the default init_genome command:

$ ./wgbstools init_genome hg19
[wt init] Setting up genome reference files in ./wgbs_tools/references/hg19
[wt init] No reference FASTA provided. Attempting to download from
https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 904M 100 904M 0 0 23.8M 0 0:00:38 0:00:38 --:--:-- 25.6M
[wt init] successfully downloaded FASTA. Now gunzip and bgzip it...
[wt init] Indexing ./wgbs_tools/references/hg19/hg19.fa.gz
[wt init] Generated index file: ./wgbs_tools/references/hg19/hg19.fa.gz.fai
[wt init] Processing chromosomes...
[wt init] chromosome: chr1
[wt init] chromosome: chr2
[wt init] chromosome: chr3
[wt init] chromosome: chr4
[wt init] chromosome: chr5
[wt init] chromosome: chr7
[wt init] chromosome: chr6
[wt init] chromosome: chr8
[wt init] chromosome: chr9
[wt init] chromosome: chr10
[wt init] chromosome: chr11
[wt init] chromosome: chr12
[wt init] chromosome: chr13
[wt init] chromosome: chr14
[wt init] chromosome: chr15
[wt init] chromosome: chr16
[wt init] chromosome: chr17
[wt init] chromosome: chr18
[wt init] chromosome: chr19
[wt init] chromosome: chr20
[wt init] chromosome: chr21
[wt init] chromosome: chr22
[wt init] chromosome: chrX
[wt init] chromosome: chrY
[wt init] chromosome: chrM
[wt init] Building CpG-Index dictionary...
[wt init] bgzip and index...
[wt init] Finished initialization of genome hg19

According to pip list I'm using pandas (0.20.3).

Answer 9 · 2022-10-30T15:19:06.000Z

I suspect the pandas version is the issue. I have version '1.3.5'. We will try to check what is the minimum version required, but I suspect this is the issue.

Answer 10 · 2022-10-30T18:36:51.000Z

I've just installed pandas 1.5.1 and the tutorial segment worked fine, so it looks like that was the problem.

Thanks!

Answer 11 · 2022-10-31T10:34:49.000Z

Excellent. Thank you for raising this issue.
I updated the README dependencies section to state pandas version >1 is required