kundajelab/tfmodisco

Error: zero-size array to reduction operation maximum which has no identity

Avsecz opened this issue · 2 comments

I tried running modisco on a larger set of sequences (100k of 1kb sequences with 4 output tasks) and it crashed after coming down to metacluster 5 containing 37k seqlets. Seems that it has encountered an empty set neighbors_of_things_to_scan. Any idea how to prevent this from happening?

On metacluster 5
Metacluster size 37059
Relevant tasks:  ('Klf4/weighted',)
Relevant signs:  (-1,)
(Round 1) num seqlets: 37059
(Round 1) Computing coarse affmat
Beginning embedding computation
Computing embeddings
Finished embedding computation in 1520.43 s
Starting affinity matrix computations

....
(Round 2) Computing clustering
Beginning preprocessing + Louvain
Wrote graph to binary file in 0.018040180206298828 seconds
Running Louvain modularity optimization
[Parallel(n_jobs=10)]: Done  30 tasks      | elapsed:   41.9s
[Parallel(n_jobs=10)]: Done 180 tasks      | elapsed:  3.1min
[Parallel(n_jobs=10)]: Done 200 out of 200 | elapsed:  3.4min finished
Louvain completed 200 runs in 253.59952878952026 seconds
Wrote graph to binary file in 0.0037131309509277344 seconds
Running Louvain modularity optimization
After 1 runs, maximum modularity is Q = 0.635078
Louvain completed 51 runs in 227.86760807037354 seconds
Preproc + Louvain took 481.4961562156677 s
Got 8 clusters after round 2
Counts:
{1: 8, 0: 13, 4: 4, 7: 2, 5: 3, 2: 6, 3: 4, 6: 2}
(Round 2) Aggregating seqlets in each cluster
Aggregating for cluster 0 with 13 seqlets
Trimmed 0 out of 13
Dropping cluster 0 with 13 seqlets due to sign disagreement
Aggregating for cluster 1 with 8 seqlets
Trimmed 0 out of 8
Dropping cluster 1 with 8 seqlets due to sign disagreement
Aggregating for cluster 2 with 6 seqlets
Trimmed 0 out of 6
Dropping cluster 2 with 6 seqlets due to sign disagreement
Aggregating for cluster 3 with 4 seqlets
Trimmed 0 out of 4
Dropping cluster 3 with 4 seqlets due to sign disagreement
Aggregating for cluster 4 with 4 seqlets
Trimmed 0 out of 4
Dropping cluster 4 with 4 seqlets due to sign disagreement
Aggregating for cluster 5 with 3 seqlets
Trimmed 0 out of 3
Dropping cluster 5 with 3 seqlets due to sign disagreement
Aggregating for cluster 6 with 2 seqlets
Trimmed 0 out of 2
Dropping cluster 6 with 2 seqlets due to sign disagreement
Aggregating for cluster 7 with 2 seqlets
Trimmed 0 out of 2
Dropping cluster 7 with 2 seqlets due to sign disagreement
Got 0 clusters
Splitting into subclusters...
Merging on 0 clusters
On merging iteration 1
Computing pattern to seqlet distances
Traceback (most recent call last):
  File "/users/avsec/bin/anaconda3/envs/chipnexus/bin/basepair", line 11, in <module>
    load_entry_point('basepair', 'console_scripts', 'basepair')()
  File "/users/avsec/workspace/basepair/basepair/__main__.py", line 162, in main
    argh.dispatch(parser)
  File "/users/avsec/bin/anaconda3/envs/chipnexus/lib/python3.6/site-packages/argh/dispatching.py", line 174, in dispatch
    for line in lines:
  File "/users/avsec/bin/anaconda3/envs/chipnexus/lib/python3.6/site-packages/argh/dispatching.py", line 277, in _execute_command
    for line in result:
  File "/users/avsec/bin/anaconda3/envs/chipnexus/lib/python3.6/site-packages/argh/dispatching.py", line 260, in _call
    result = function(*positional, **keywords)
  File "/users/avsec/workspace/basepair/basepair/cli/modisco.py", line 252, in modisco_run
    one_hot=thr_one_hot)
  File "/users/avsec/workspace/kl/tfmodisco/modisco/tfmodisco_workflow/workflow.py", line 309, in __call__
    seqlets_to_patterns_result = seqlets_to_patterns(metacluster_seqlets)
  File "/users/avsec/workspace/kl/tfmodisco/modisco/tfmodisco_workflow/seqlets_to_patterns.py", line 670, in __call__
    patterns=split_patterns, seqlets=seqlets) 
  File "/users/avsec/workspace/kl/tfmodisco/modisco/aggregator.py", line 939, in __call__
    filter_seqlets=patterns))
  File "/users/avsec/workspace/kl/tfmodisco/modisco/affinitymat/core.py", line 439, in __call__
    min_overlap=self.pattern_comparison_settings.min_overlap) 
  File "/users/avsec/workspace/kl/tfmodisco/modisco/affinitymat/core.py", line 474, in __call__
    assert np.max(neighbors_of_things_to_scan) < filters.shape[0]
  File "/users/avsec/bin/anaconda3/envs/chipnexus/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2320, in amax
    out=out, **kwargs)
  File "/users/avsec/bin/anaconda3/envs/chipnexus/lib/python3.6/site-packages/numpy/core/_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity

Thank you for flagging this. I know it's very frustrating to have a longish-running job that crashes. I put in a fix that I think should work in #22 - could you test it out?

Sorry for the late response - worked perfectly. Thanks!