Adding nodes to database takes long time
Opened this issue · 5 comments
Hi,
If "adding the notes to the database" takes several minutes, did I do something wrong or could that be correct?
input image shape: (16, 63, 2, 512, 512)
cell channel image shape: (16, 63, 512, 512)
/g/cba/exchange/erk-signalling-dynamics/code/python/ultrack_tracking.py:49: RuntimeWarning: invalid value encountered in divide
dist = dist / dist.max(axis=(1, 2, 3), keepdims=True)
computed edges
saved edges
Adding nodes to database: 62%|████████████████████████████████████████▋ | 10/16 [07:58<10:39, 106.53s/it
In fact it threw an error now:
Linking nodes.: 0%| | 0/15 [00:00<?, ?it/s]Traceback (most recent call last):
File "/g/cba/exchange/erk-signalling-dynamics/code/python/ultrack_tracking.py", line 164, in <module>
fire.Fire(cli)
File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/g/cba/exchange/erk-signalling-dynamics/code/python/ultrack_tracking.py", line 128, in track
link(config)
File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/ultrack/core/linking/processing.py", line 230, in link
multiprocessing_apply(
File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/ultrack/utils/multiprocessing.py", line 56, in multiprocessing_apply
return [func(t) for t in tqdm(sequence, desc=desc)]
File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/ultrack/utils/multiprocessing.py", line 56, in <listcomp>
return [func(t) for t in tqdm(sequence, desc=desc)]
File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/toolz/functoolz.py", line 304, in __call__
return self._partial(*args, **kwargs)
File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/ultrack/core/linking/processing.py", line 109, in _process
current_kdtree = KDTree(current_pos)
File "/g/cba/miniconda3/envs/ultrack/lib/python3.10/site-packages/scipy/spatial/_kdtree.py", line 360, in __init__
super().__init__(data, leafsize, compact_nodes, copy_data,
File "_ckdtree.pyx", line 558, in scipy.spatial._ckdtree.cKDTree.__init__
ValueError: data must be 2 dimensions
Linking nodes.: 0%|
Are you using remote storage like ESS, NFS, or Lustre?
Because of the higher latency of remote storage, the multi-processing can get into a deadlock.
I recommend reducing the number of workers.
Yes, data is on NFS.
How can I reduce the number of workers?
Related, I run this on a compute node of a slurm cluster, e.g.
srun --nodes=1 --cpus-per-task=4 --mem-per-cpu=16000 --time=01:00:00 --pty /bin/bash
- What would you recommend I should ask for in terms of resources?
- How can I tell python how many workers (CPUs) it should actually use? Because my experience is that the python multi-processing does not care about what
slurm
actually allocates for it...
Hey @tischi,
How can I reduce the number of workers?
How can I tell python how many workers (CPUs) it should actually use? Because my experience is that the python multi-processing does not care about what slurm actually allocates for it...
With the n_workers
parameters from the configuration, the configuration docs are here.
What would you recommend I should ask for in terms of resources?
It depends on the size of your data and how long you can wait for the processing.
When using sqlite
backend (default), I don't go for more than 8.
With Postgres and distributed computation, I usually scale to 100 or more nodes, but each node has a single worker (n_workers=1
), but this requires more work, and it's only worth it for TB-scale datasets.