Speed difference between binaries and python wrappers
Closed this issue ยท 14 comments
Describe the bug
There is a massive difference of computation time between Terminal CLI and the Python wrappers for N4BiasFieldCorrection
and DenoiseImage
From the terminal :
N4BiasFieldCorrection : 22s
DenoiseImage : 76s
From Python:
N4BiasFieldCorrection : 33s
DenoiseImage : 602s
To reproduce
I use a classic 3DT2 in 0.8mm iso:
************************************************
Image name: "in.nii"
************************************************
Dimensions: 256 x 320 x 320
Voxel size: 0.8 x 0.8 x 0.8
Data strides: [ 1 2 3 ]
Format: NIfTI-1.1
Data type: signed 16 bit integer (little endian)
Intensity scaling: offset = 0, multiplier = 1
Transform: 1 0 0 -106.6
0 1 0 -93.8
0 0 1 -136
comments: TE=4.2e+02;Time=164035.610;phase=1
From the terminal :
cd /tmp/test_antspy
N4BiasFieldCorrection -i in.nii -o n4_in.nii -v 1
DenoiseImage -i n4_in.nii -o dn_n4_in.nii -v 1 -n Rician
From Python :
import ants
img = ants.image_read('/tmp/test_antspy/in.nii')
img = ants.n4_bias_field_correction(img, verbose=True)
img = ants.denoise_image(img, v=1)
ants.image_write(img, '/tmp/test_antspy/out.nii')
Expected behavior
Since I built from source ANTs and ANTsPy, I would expect roughly the same computation.
x1.5 N4BiasFieldCorrection
is unexpected but ok, however x8 for DenoiseImage
is very weird.
ANTsPy installation (please complete the following information):
- Hardware [ PC ] i9-11900K is 8 cores (x2 threads) CPU
- OS: [ Linux 4.15.0-20-generic x86_64 // Linux Mint 19 ]
- System details [ None ]
- Sub-system: [ Built from source // commit 751fec7 ]
- ANTsPy version: [ 0.5.4 ]
- Installation type: [ git clone + built from source with
python -m pip install .
]
Additional context
When running both tests, I can see in htop
that all 16 CPUs are running at 100%, with both Terminal CLI and Python wrappers. So it's not an obvious multi-threading problem.
For DenoiseImage
, it's a difference in defaults, the search radius is 2 in the CLI program and 3 in ANTsPy. If I call ants.denoise_image(img, r=2, v=1)
, the difference in performance is similar to that for N4.
@ntustison @stnava shall we harmonize defaults, which one to adopt? I'll go with faster (2) unless you have a preference to make antspy the standard.
Yeah, I'm all for harmonizing the defaults and I'd go with what's in the original ANTs DenoiseImage.
For denoiseimage in particular, I would hope you could normalize with the OG implementation minc_anlm
from minc-toolkit-v2
Is there a usage with defaults you could paste here?
DenoiseImage was ported from Jose's original Matlab code. Minc code was not referenced at all.
Well minc_anlm
was written by/with Jose, given the citation has L Collins of the MNI as one of the senior authors ๐๐ป
$ minc_anlm
This program implements adaptative non-local denoising algorithm published in
Jose V. Manjon, Pierrick Coupe, Luis Marti-Bonmati, D. Louis Collins, Montserrat Robles "Adaptive non-local means denoising of MR images with spatially varying noise levels" Journal of Magnetic Resonance Imaging Volume 31, Issue 1, pages 192โ203, January 2010
DOI: 10.1002/jmri.22003
I profiled n4_bias_correction with the line_profiler / kernprof, the library function execution accounts for 99.4% of the execution time. So there's not a lot of work happening at the wrapper level.
Yes, I realize that. But you were referring to a specific implementation in the context of defaults and that's why I clarified that it was Jose's original Matlab code.
Just a bit more historical context---I happened to be invited by an MNI-adjacent friend for a get-together during MICCAI 2013 in Nagoya, Japan. Fortunately, I sat right across the table from Jose and, after discussing common interests (such our enjoyment of Luis Miguel), he realized I was "one of the ANTs guys" and he asked me if I would like to put his denoising algorithm in ANTs. I said sure and he pointed me to his Matlab code which I eventually ported to ITK-style. After the FreeSurfer folk began using the implementation in their pipeline a couple years ago, I asked Jose about the possibility of making it an ITK module and he was all for it.
For
DenoiseImage
, it's a difference in defaults, the search radius is 2 in the CLI program and 3 in ANTsPy. If I callants.denoise_image(img, r=2, v=1)
, the difference in performance is similar to that for N4.
Correct, I did not notice the difference with r
default parameter.
Here is what I have now :
DenoiseImage | r=2 | r=3 |
---|---|---|
CLI | 76s | 179s |
Python | 248s | 602s |
Thanks for testing, @benoitberanger
Would you mind trying out #705 ? If you have the Github CLI, you can do
gh pr checkout 705
It appears to close the gap on my Mac.
Wow! Thanks for reporting this