toCooler error because of input txt file
clementmo opened this issue · 0 comments
Hello Xiaotao,
I am a postdoctor from HZAU and now is learning data analysis for Hi-C. Recently I am using the HiCPeaks software to transform the raw matrix generated by HiC-pro to cool file. Some problems can't be solved.
According to your guidelines, I tried to substract interaction information for chr01 from the raw matrix HPC9_150000.matrix. According to file HPC9_150000_abs.bed , the chr01 is binned to 754 windows. So I generated a file with the code
awk '$1<=754&&$2<=754{print}' HPC9_150000.matrix >1_1.txt
head -5 HPC9_150000.matrix
1 1 1599
1 2 577
1 3 117
1 4 103
1 5 68
head -5 HPC9_150000_abs.bed
Chr01 0 150000 1
Chr01 150000 300000 2
Chr01 300000 450000 3
Chr01 450000 600000 4
Chr01 600000 750000 5
Then I run toCooler with code
toCooler -O HPC9_1.cool -d datasets --nproc 1 --chromsizes-file Ga_1.chromsizes &
It generates error "IndexError: index 754 is out of bounds for axis 0 with size 754"
File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/hicpeaks-0.3.4-py3.6.egg/EGG-INFO/scripts/toCooler", line 128, in run
balance(cooler_uri, nproc=args.nproc)
File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/hicpeaks-0.3.4-py3.6.egg/hicpeaks/utilities.py", line 417, in balance
map=map_)
File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/balance.py", line 332, in balance_cooler
.reduce(add, np.zeros(n_bins))
File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/tools.py", line 244, in reduce
return reduce(binop, iter(self.run()), init)
File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/tools.py", line 54, in apply_pipeline
data = func(chunk, data)
File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/cooler/balance.py", line 46, in _zero_trans
mask = chrom_ids[pixels['bin1_id']] != chrom_ids[pixels['bin2_id']]
File "/public/home/software/opt/bio/software/HiCPeaks/0.3.4/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2149, in __getitem__values=self._codes[key], dtype=self.dtype, fastpath=True
IndexError: index 754 is out of bounds for axis 0 with size 754
I noticed that the number of first two columes in input 1_1.txt file should be smaller than binned chr windows 754, instead of equal or larger than 754.
I tried to analyze the chr02, I used the code
awk '$1>=755&&$1<=1415&&$2>=755&&$2<=1415{print}' HPC9_150000.matrix >2_2.txt
I replaced 1_1.txt with 2_2.txt under directory ./150K/, then it generated similar errors "IndexError: index 755 is out of bounds for axis 0 with size 661" 661 is the binned number of chr02.
How to prepare the input file correctlly?
By the way, should I prepare the chr_chr.txt files for all the chromosomes one by one ?
Should I put all these chr_chr.txt files under the same ./150K/ directory ?
I hope you can reply. Thank you so much !!!
You can reply through email 1067648804@qq.com if you think it is more convenient.
Best wishes.
Pengcheng