seqcode/miniMDS

partitioned MDS fails

Closed this issue · 10 comments

I'm trying to use miniMDS with some dataset, and it seems there is a bug when trying to do partitioned MDS with 2 files of different resolution:

python ~/code/mpi/miniMDS/minimds.py -l 6-8h_4_50kb.bed -o test.tsv 6-8h_4_2kb.bed
[...]
Traceback (most recent call last):
  File "/Users/agrimaldi/code/mpi/miniMDS/minimds.py", line 169, in <module>
    main()
  File "/Users/agrimaldi/code/mpi/miniMDS/minimds.py", line 163, in main
    cluster = partitionedMDS(args.path, args.l, params)
  File "/Users/agrimaldi/code/mpi/miniMDS/minimds.py", line 87, in partitionedMDS
    tad.subclustersFromTads(highCluster, lowCluster, lowTads)
TypeError: subclustersFromTads() takes exactly 2 arguments (3 given)

The tests run fine btw

Hi, sorry for the late reply. Do you mind sending me the input files? My email address is lur159@psu.edu.

I tried to set the n parameter to 11 but I got the following error:

python ../../bin/miniMDS/minimds.py -n 11 HiCtool_normalized_fend_blist_chr1_10000b_0b_195470000b.bed

Scanning HiCtool_normalized_fend_blist_chr1_10000b_0b_195470000b.bed
Identifying loci 1% complete
Identifying loci 2% complete
Identifying loci 3% complete
...
Identifying loci 99% complete
Identifying loci 100% complete
Scanning HiCtool_normalized_fend_blist_chr1_10000b_0b_195470000b.bed
Identifying loci 1% complete
Identifying loci 2% complete
Identifying loci 3% complete
...
Identifying loci 99% complete
Identifying loci 100% complete
Traceback (most recent call last):
File "../../bin/miniMDS/minimds.py", line 142, in
main()
File "../../bin/miniMDS/minimds.py", line 133, in main
structure = partitionedMDS(args.path, params)
File "../../bin/miniMDS/minimds.py", line 68, in partitionedMDS
infer_structure(low_contactMat, lowstructure, alpha, num_threads)
File "../../bin/miniMDS/minimds.py", line 27, in infer_structure
coords = manifold.MDS(n_components=3, metric=True, random_state=np.random.RandomState(), verbose=0, dissimilarity="precomputed", n_jobs=num_threads).fit_transform(distMat)
File "/usr/lib64/python2.7/site-packages/sklearn/manifold/mds.py", line 429, in fit_transform
return_n_iter=True)
File "/usr/lib64/python2.7/site-packages/sklearn/manifold/mds.py", line 266, in smacof
for seed in seeds)
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 749, in call
n_jobs = self._initialize_backend()
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 547, in _initialize_backend
**self._backend_args)
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 317, in configure
self._pool = MemmapingPool(n_jobs, **backend_args)
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/pool.py", line 600, in init
super(MemmapingPool, self).init(**poolargs)
File "/usr/lib64/python2.7/site-packages/sklearn/externals/joblib/pool.py", line 420, in init
super(PicklingPool, self).init(**poolargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 159, in init
self._repopulate_pool()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 213, in _repopulate_pool
for i in range(self._processes - len(self._pool)):
TypeError: unsupported operand type(s) for -: 'str' and 'int'

Thanks in advance.

Thanks for your prompt answer. Here you can find attached (I modified the ext to txt)

HiCtool_normalized_fend_blist_chr1_10000b_0b_195470000b.txt

Thanks for pointing this out. The argument type wasn't set correctly. It should be fixed now.

However, I took a look at your data and it seems like there are some issues with it. The distance decay curve doesn't look right (first image below). For comparison I've attached the distance decay curve for the same chromosome and resolution from the Rao et al GM12878 dataset (second image below).
ahorvath_distance_decay
rao_gm12878_distance_decay

Thanks for your help. I'll check both. How can I make this plot? Now I'm trying HiCPro. As far as I know the GM12878 dataset was processed by that.

The Rao datasets were normalized with the KR method (using their own pipelines).

I've added a script to plot distance decay in the scripts folder.

python distance_decay.py HiCtool_normalized_fend_blist_chr1_10000b_0b_195470000b.txt

Would you mind if I looked at your raw data? I wonder if this is an issue with your data or with normalization.

Thanks, that would very good for me because I'd make sure that my results are valid.

https://www.ncbi.nlm.nih.gov/sra?term=SRX1513812 (two runs were merged)

I got this from the above dataset using ice normalization on chrX.

rep1_distance_decay

Do you think that's acceptable?

Thanks

I looked at the raw data and couldn't get higher resolution than 750kb (though I was only using one run, this could improve a bit if they were merged). In the original paper they binned at 300kb and mentioned something about insufficient resolution. I ran:

python minimds.py -p 0.01 SRR3081790_chr1_750kb.bed

The structure looks ok overall, other than some outliers (probably pericentromeric regions).

MiniMDS still improves the resolution. You can see this by running for comparison:

python minimds.py --full SRR3081790_chr1_750kb.bed

Full MDS performs worse on this dataset in my opinion.

Also, the distance decay looks good at 750kb:

distance_decay_750kb