tariks/peakachu

peakachus score_chromosome question

Closed this issue · 7 comments

Hi, I created an environment that succesfully ran peakachu train on some publicly available Hi-C datasets. I am now trying to run peakachu score_chromosome but am encountering the following error:

line 245, in getnnz
raise ValueError('row, column, and data array must all be the '
ValueError: row, column, and data array must all be the same length

I am running each chromosome individually,
"peakachu score_chromosome -p HiC.cool --balance -O scores -m ~/peakachu/models/chr1.pkl"

Any help would be much appreciated!

I've never met such error before .. could you paste full traceback? Would be helpful for trouble shooting.

Xiaotao

Here's the full output I got from my run

/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/sklearn/externals/joblib/https://urldefense.proofpoint.com/v2/url?u=http-3A__-5F-5Finit-5F-5F.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=fei7OWqoAEnLJBxWyK-fk-OKrPFagFZvMjzVPyg55Js&e=:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
warnings.warn(msg, category=DeprecationWarning)
/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/peakachu/https://urldefense.proofpoint.com/v2/url?u=http-3A__scoreUtils.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=v0JDBzYtVMU74AiT38k3VEHoqGiHRo3NbJKDyZhHwK4&e=:62: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
fts = np.vstack((i for i in fts))
scoring matrix chr2
num candidates 2607852
Traceback (most recent call last):
File "/home/ss45w/miniconda3/envs/peakachu_env2/bin/peakachu", line 4, in
import('pkg_resources').run_script('peakachu==1.1.3', 'peakachu')
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/pkg_resources/https://urldefense.proofpoint.com/v2/url?u=http-3A__-5F-5Finit-5F-5F.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=fei7OWqoAEnLJBxWyK-fk-OKrPFagFZvMjzVPyg55Js&e=", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/pkg_resources/https://urldefense.proofpoint.com/v2/url?u=http-3A__-5F-5Finit-5F-5F.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=fei7OWqoAEnLJBxWyK-fk-OKrPFagFZvMjzVPyg55Js&e=", line 1462, in run_script
exec(code, namespace, namespace)
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/EGG-INFO/scripts/peakachu", line 76, in
run()
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/EGG-INFO/scripts/peakachu", line 72, in run
args.func(args)
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/peakachu/https://urldefense.proofpoint.com/v2/url?u=http-3A__score-5Fchromosome.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=nHnBrkgVkuK4So8Gw-aERJxOLVSeVIcVpixRmuRi1Us&e=", line 53, in main
result,R = X.score()
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/peakachu/https://urldefense.proofpoint.com/v2/url?u=http-3A__scoreUtils.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=v0JDBzYtVMU74AiT38k3VEHoqGiHRo3NbJKDyZhHwK4&e=", line 80, in score
self.M = sparse.csr_matrix((data, (ri, ci)), shape=self.M.shape)
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__compressed.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=1PIV3YXsVE7Bp7Yb-RVp_4XM2SMDwxM-_XJM266-ZD0&e=", line 57, in init
other = self.class(coo_matrix(arg1, shape=shape))
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__coo.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=NDOx54JVX2F30nBNq6hJQEMG8PPjQmT84XTNVx2rUCw&e=", line 198, in init
self._check()
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__coo.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=NDOx54JVX2F30nBNq6hJQEMG8PPjQmT84XTNVx2rUCw&e=", line 283, in _check
if self.nnz > 0:
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__base.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=HFA7Jqy6Wq0wO0G4Xl7KX980eLBDpMi_RJPvw4-lef8&e=", line 250, in nnz
return self.getnnz()
File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__coo.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=NDOx54JVX2F30nBNq6hJQEMG8PPjQmT84XTNVx2rUCw&e=", line 245, in getnnz
raise ValueError('row, column, and data array must all be the '
ValueError: row, column, and data array must all be the same length

A row/column mismatch can happen if you train with one window size (say w=4) and try to predict with another (default is 5). If that isn't the case here, then could you point us to the data files you are using? If the parameters are correct and the files work on my build, then the problem is in the installation somewhere. What kind of machine are you using and could you provide the command you used for training?

Hi, did you re-train the model yourself on Rao2014-GM12878-MboI-allreps-filtered.10kb.cool and perform the predictions on your cool files? It makes sense if this is the case because Rao2014-GM12878-MboI-allreps-filtered.10kb.cool was generated by an old version of cooler, in which the ICE-normalized values must have different range from your current cools.

I recommend using the pre-trained models we released for predictions. Or if you have your own positive training sets in GM12878, you can first re-run ICE and overwrite the 'weight' column with current cooler version before training: cooler balance -f Rao2014-GM12878-MboI-allreps-filtered.10kb.cool.

Thanks for getting back to me. I'm trying to analyze a mouse HiC dataset (GSE95533). For the positive training set, I wasn't sure what to use so I used a bedpe file from CHiC data (same study) - is this appropriate?

I used distiller-nf to create .cool files for training, I ran this on our institution's High Performance Computing Cluster:
"peakachu train -p D0-Mandrupn1n2__mm10.1000.cool --balance -O models -b Mandrup_CHiC_mm10_interactions.bedpe"

I can do the 'cooler balance -f' option you suggested on my .cool files and train again. Another question I had was should I be training on a .multires.cool file instead of the .cool file?

I'm just a beginner when it comes to Hi-C analysis so really appreciate the help

Hey, currently we only recommend 10Kb resolution matrix on which peakachu has been thoroughly tested.

Seems your input matrix was in 1kb (according to your file name)? Then the inconsistency could be the reason of your previous error because the default resolution parameter (-r) of Peakachu is 10000.

I re-ran my training with a 10,000 kb input matrix and have now successfully run 'peakachu score_chromosome.' I still need to visualize it but really appreciate the help. Thanks!