ANTsPy crashes Python when registering two images
JescoS opened this issue · 4 comments
Describe the bug
Python crashes without any error when calling ants.registration
to rigidly register one MRI image to another. This only occurs with two specific images in my dataset.
To reproduce
import ants
fixed = ants.image_read(".../t1.nii.gz")
moving = ants.image_read(".../t1_gd.nii.gz")
# It crashes here when registering 't1_gd' (moving image) to 't1' (fixed image)
reg = ants.registration(
fixed,
moving,
"Rigid",
verbose=True,
)
Expected behavior
The registration to happen successfully, or at least that a helpful error message is provided.
Screenshots
Output from the code provided under To reproduce
antsRegistration -d 3 -r [0x55be9ed2c2d0,0x55be9ed36050,1] -m mattes[0x55be9ed2c2d0,0x55be9ed36050,1,32,regular,0.2] -t Rigid[0.25] -c 2100x1200x1200x10 -s 3x2x1x0 -f 6x4x2x1 -u 1 -z 1 -o [/tmp/tmpevgyaz8y,0x55be9ecbb2c0,0x55be9f006d20] -x [NA,NA] --float 1 --write-composite-transform 0 -v 1
All_Command_lines_OK
Using single precision for computations.
=============================================================================
The composite transform comprises the following transforms (in order):
1. Center of mass alignment using fixed image: 0x55be9ed2c2d0 and moving image: 0x55be9ed36050 (type = Euler3DTransform)
=============================================================================
Reading mask(s).
Registration stage 0
No fixed mask
No moving mask
number of levels = 4
fixed image: 0x55be9ed2c2d0
moving image: 0x55be9ed36050
Dimension = 3
Number of stages = 1
Use histogram matching = true
Winsorize image intensities = false
Lower quantile = 0
Upper quantile = 1
Stage 1 State
Image metric = MattesMI
Fixed image = Image (0x55be9f0b6ff0)
RTTI typeinfo: itk::Image<float, 3u>
Reference Count: 2
Modified Time: 917
Debug: Off
Object Name:
Observers:
none
Source: (none)
Source output name: (none)
Release Data: Off
Data Released: False
Global Release Data: Off
PipelineMTime: 0
UpdateMTime: 889
RealTimeStamp: 0 seconds
LargestPossibleRegion:
Dimension: 3
Index: [0, 0, 0]
Size: [1024, 1024, 175]
BufferedRegion:
Dimension: 3
Index: [0, 0, 0]
Size: [1024, 1024, 175]
RequestedRegion:
Dimension: 3
Index: [0, 0, 0]
Size: [1024, 1024, 175]
Spacing: [0.234375, 0.234375, 1]
Origin: [117.71, 77.0839, 24.583]
Direction:
-0.999791 -0.0203991 -0.000987429
0.0193326 -0.929714 -0.367774
0.00658424 -0.367716 0.929915
IndexToPointMatrix:
-0.234326 -0.00478105 -0.000987429
0.00453107 -0.217902 -0.367774
0.00154318 -0.0861835 0.929915
PointToIndexMatrix:
-4.26578 0.0824856 0.0280928
-0.0870363 -3.96678 -1.56892
-0.000987429 -0.367774 0.929915
Inverse Direction:
-0.999791 0.0193326 0.00658424
-0.0203991 -0.929714 -0.367716
-0.000987429 -0.367774 0.929915
PixelContainer:
ImportImageContainer (0x55be88baaf30)
RTTI typeinfo: itk::ImportImageContainer<unsigned long, float>
Reference Count: 1
Modified Time: 887
Debug: Off
Object Name:
Observers:
none
Pointer: 0x7f3e5a0ac010
Container manages memory: true
Size: 183500800
Capacity: 183500800
Moving image = Image (0x55be9f0b9590)
RTTI typeinfo: itk::Image<float, 3u>
Reference Count: 2
Modified Time: 918
Debug: Off
Object Name:
Observers:
none
Source: (none)
Source output name: (none)
Release Data: Off
Data Released: False
Global Release Data: Off
PipelineMTime: 0
UpdateMTime: 915
RealTimeStamp: 0 seconds
LargestPossibleRegion:
Dimension: 3
Index: [0, 0, 0]
Size: [1024, 1024, 55]
BufferedRegion:
Dimension: 3
Index: [0, 0, 0]
Size: [1024, 1024, 55]
RequestedRegion:
Dimension: 3
Index: [0, 0, 0]
Size: [1024, 1024, 55]
Spacing: [0.234375, 0.234375, 3]
Origin: [119.193, 75.9313, 31.5048]
Direction:
-0.999759 -0.0203991 -0.00813477
0.0219612 -0.929714 -0.367626
-6.3751e-05 -0.367716 0.929938
IndexToPointMatrix:
-0.234318 -0.00478105 -0.0244043
0.00514716 -0.217902 -1.10288
-1.49416e-05 -0.0861835 2.78981
PointToIndexMatrix:
-4.26564 0.0937012 -0.000272009
-0.0870363 -3.96678 -1.56892
-0.00271159 -0.122542 0.309979
Inverse Direction:
-0.999759 0.0219612 -6.37521e-05
-0.0203991 -0.929714 -0.367716
-0.00813477 -0.367626 0.929938
PixelContainer:
ImportImageContainer (0x55be9f0b9820)
RTTI typeinfo: itk::ImportImageContainer<unsigned long, float>
Reference Count: 1
Modified Time: 913
Debug: Off
Object Name:
Observers:
none
Pointer: 0x7f3e0e3ff010
Container manages memory: true
Size: 57671680
Capacity: 57671680
Weighting = 1
Sampling strategy = regular
Number of bins = 32
Radius = 4
Sampling percentage = 0.2
Transform = Rigid
Gradient step = 0.25
Update field sigma (voxel space) = 0
Total field sigma (voxel space) = 0
Update field time sigma = 0
Total field time sigma = 0
Number of time indices = 0
Number of time point samples = 0
Registration using 1 total stages.
Stage 0
iterations = 2100x1200x1200x10
convergence threshold = 1e-06
convergence window size = 10
number of levels = 4
using the Mattes MI metric (number of bins = 32, weight = 1, use gradient filter = 0)
preprocessing: histogram matching the images
Shrink factors (level 1 out of 4): [6, 6, 1]
Shrink factors (level 2 out of 4): [4, 4, 1]
Shrink factors (level 3 out of 4): [2, 2, 1]
Shrink factors (level 4 out of 4): [1, 1, 1]
smoothing sigmas per level: [3, 2, 1, 0]
regular sampling (percentage = 0.2)
*** Running Euler3DTransform registration ***
DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
2DIAGNOSTIC, 1, -7.784408330917e-01, inf, 3.3159e+00, 3.3159e+00,
2DIAGNOSTIC, 2, -7.880971431732e-01, inf, 3.6818e+00, 3.6594e-01,
2DIAGNOSTIC, 3, -7.961868643761e-01, inf, 4.0615e+00, 3.7965e-01,
2DIAGNOSTIC, 4, -7.995039820671e-01, inf, 4.4379e+00, 3.7644e-01,
2DIAGNOSTIC, 5, -8.014418482780e-01, inf, 4.8091e+00, 3.7121e-01,
2DIAGNOSTIC, 6, -8.084483146667e-01, inf, 5.2293e+00, 4.2015e-01,
2DIAGNOSTIC, 7, -8.104047179222e-01, inf, 5.7435e+00, 5.1423e-01,
2DIAGNOSTIC, 8, -8.110386133194e-01, inf, 6.1126e+00, 3.6908e-01,
2DIAGNOSTIC, 9, -8.157773613930e-01, inf, 6.5345e+00, 4.2192e-01,
2DIAGNOSTIC, 10, -8.202537894249e-01, 2.929246518761e-03, 6.9102e+00, 3.7568e-01,
2DIAGNOSTIC, 11, -8.206747770309e-01, 2.245417330414e-03, 7.3237e+00, 4.1351e-01,
2DIAGNOSTIC, 12, -8.211225867271e-01, 1.770743401721e-03, 7.8365e+00, 5.1281e-01,
2DIAGNOSTIC, 13, -8.212584853172e-01, 1.435689860955e-03, 8.2116e+00, 3.7508e-01,
2DIAGNOSTIC, 14, -8.218446373940e-01, 1.116505824029e-03, 8.6343e+00, 4.2277e-01,
2DIAGNOSTIC, 15, -8.223311305046e-01, 7.893703295849e-04, 9.1028e+00, 4.6844e-01,
2DIAGNOSTIC, 16, -8.223287463188e-01, 5.852135363966e-04, 9.5163e+00, 4.1355e-01,
2DIAGNOSTIC, 17, -8.223301768303e-01, 3.945839998778e-04, 9.8815e+00, 3.6512e-01,
2DIAGNOSTIC, 18, -8.223319649696e-01, 2.001180691877e-04, 1.0310e+01, 4.2856e-01,
2DIAGNOSTIC, 19, -8.223304152489e-01, 9.293391485699e-05, 1.0684e+01, 3.7397e-01,
2DIAGNOSTIC, 20, -8.223320841789e-01, 6.619643681915e-05, 1.1105e+01, 4.2054e-01,
2DIAGNOSTIC, 21, -8.223316073418e-01, 4.296490442357e-05, 1.1524e+01, 4.1926e-01,
2DIAGNOSTIC, 22, -8.223342299461e-01, 2.533297993068e-05, 1.1953e+01, 4.2919e-01,
2DIAGNOSTIC, 23, -8.223354816437e-01, 8.932824130170e-06, 1.2375e+01, 4.2202e-01,
2DIAGNOSTIC, 24, -8.223351836205e-01, 1.526392907181e-06, 1.2798e+01, 4.2326e-01,
2DIAGNOSTIC, 25, -8.223311305046e-01, 1.437342575628e-06, 1.3218e+01, 4.1943e-01,
2DIAGNOSTIC, 26, -8.223313093185e-01, 1.312344238613e-06, 1.3843e+01, 6.2565e-01,
2DIAGNOSTIC, 27, -8.223315477371e-01, 1.209031893268e-06, 1.4270e+01, 4.2643e-01,
2DIAGNOSTIC, 28, -8.223308920860e-01, 1.124806317421e-06, 1.4794e+01, 5.2415e-01,
2DIAGNOSTIC, 29, -8.223308324814e-01, 1.029444206324e-06, 1.5218e+01, 4.2359e-01,
DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
2DIAGNOSTIC, 1, -7.919459939003e-01, inf, 2.1032e+01, 5.8147e+00,
2DIAGNOSTIC, 2, -7.922274470329e-01, inf, 2.1988e+01, 9.5568e-01,
2DIAGNOSTIC, 3, -7.923026680946e-01, inf, 2.3133e+01, 1.1451e+00,
2DIAGNOSTIC, 4, -7.924258708954e-01, inf, 2.4152e+01, 1.0188e+00,
2DIAGNOSTIC, 5, -7.924597263336e-01, inf, 2.5251e+01, 1.0996e+00,
2DIAGNOSTIC, 6, -7.925294637680e-01, inf, 2.6502e+01, 1.2502e+00,
2DIAGNOSTIC, 7, -7.925676107407e-01, inf, 2.7294e+01, 7.9280e-01,
2DIAGNOSTIC, 8, -7.925767302513e-01, inf, 2.8252e+01, 9.5760e-01,
2DIAGNOSTIC, 9, -7.925794720650e-01, inf, 2.9275e+01, 1.0227e+00,
2DIAGNOSTIC, 10, -7.925798296928e-01, 4.663267100113e-05, 3.0165e+01, 8.9031e-01,
2DIAGNOSTIC, 11, -7.925813794136e-01, 2.782916271826e-05, 3.1149e+01, 9.8371e-01,
2DIAGNOSTIC, 12, -7.925839424133e-01, 1.838119169406e-05, 3.2076e+01, 9.2773e-01,
2DIAGNOSTIC, 13, -7.925873398781e-01, 1.107098978537e-05, 3.3009e+01, 9.3283e-01,
2DIAGNOSTIC, 14, -7.925903797150e-01, 7.265426120284e-06, 3.3815e+01, 8.0576e-01,
2DIAGNOSTIC, 15, -7.925956845284e-01, 4.413903752720e-06, 3.4765e+01, 9.5040e-01,
2DIAGNOSTIC, 16, -7.925977110863e-01, 3.293936742921e-06, 3.5674e+01, 9.0855e-01,
2DIAGNOSTIC, 17, -7.925973534584e-01, 2.977928033943e-06, 3.6440e+01, 7.6642e-01,
2DIAGNOSTIC, 18, -7.925977706909e-01, 2.784697244351e-06, 3.7330e+01, 8.8950e-01,
2DIAGNOSTIC, 19, -7.925984859467e-01, 2.582474280644e-06, 3.8237e+01, 9.0673e-01,
2DIAGNOSTIC, 20, -7.926011681557e-01, 2.369023832216e-06, 3.9432e+01, 1.1949e+00,
2DIAGNOSTIC, 21, -7.926015257835e-01, 2.134311671398e-06, 4.0366e+01, 9.3464e-01,
2DIAGNOSTIC, 22, -7.926001548767e-01, 1.883459503915e-06, 4.1423e+01, 1.0564e+00,
2DIAGNOSTIC, 23, -7.925997972488e-01, 1.662279260017e-06, 4.2299e+01, 8.7658e-01,
2DIAGNOSTIC, 24, -7.925995588303e-01, 1.472873009334e-06, 4.3217e+01, 9.1793e-01,
2DIAGNOSTIC, 25, -7.926007509232e-01, 1.376191448799e-06, 4.4349e+01, 1.1316e+00,
2DIAGNOSTIC, 26, -7.926012277603e-01, 1.311355731559e-06, 4.5566e+01, 1.2175e+00,
2DIAGNOSTIC, 27, -7.926009297371e-01, 1.235148147316e-06, 4.7030e+01, 1.4635e+00,
2DIAGNOSTIC, 28, -7.926003932953e-01, 1.158303689408e-06, 4.8101e+01, 1.0711e+00,
2DIAGNOSTIC, 29, -7.926004528999e-01, 1.093328023671e-06, 4.9094e+01, 9.9350e-01,
2DIAGNOSTIC, 30, -7.926002740860e-01, 1.062158162313e-06, 5.0157e+01, 1.0629e+00,
2DIAGNOSTIC, 31, -7.926017642021e-01, 1.054402559930e-06, 5.1191e+01, 1.0335e+00,
2DIAGNOSTIC, 32, -7.926021218300e-01, 1.032973159454e-06, 5.2104e+01, 9.1302e-01,
2DIAGNOSTIC, 33, -7.926020622253e-01, 1.005100102702e-06, 5.3521e+01, 1.4168e+00,
Killed
ANTsPy installation (please complete the following information):
- Hardware [ PC ]
- OS: [ Windows, Ubuntu ]
- System details [ Linux 5.15.146.1-microsoft-standard-WSL2 x86_64 ]
- Sub-system: [ WSL, Docker ]
- ANTsPy version: [ 0.5.2 ]
- Installation type: [ downloaded wheel from https://github.com/ANTsX/ANTsPy/releases/download/v0.5.2/antspyx-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl ]
Additional context
The registration only seems to fail for a particular combination of images in my dataset, so I cannot reproduce the error with any publicly available data, unfortunately. It also seems to fail after a different number of iterations every time I run the provided code.
Out of memory? Try increasing RAM available to docker?
Are the other images that work successfully as large as these? 1024x1024x175 is quite large.
It also seems to fail after a different number of iterations every time I run the provided code.
I suspect it is converging at different numbers of iterations, then running out of memory. Before each stage, it will print
DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
If this is consistently printed twice, it's probably that it's converging differently due to random sampling and then trying and failing to start the next level.
I can see a small increase in the number of objects and the RSS memory over time, when running registration in a loop. Just registering a normal T1w image to MNI, repeatedly
import ants
import gc
import psutil
import sys
import time
t1w = ants.image_read('t1w.nii.gz')
template = ants.image_read('tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz')
process = psutil.Process()
# Function to convert bytes to MB
def bytes_to_mb(bytes):
return round(bytes / 1024 / 1024, 3)
def get_object_description(obj):
"""Generate a description of the object."""
if isinstance(obj, (list, set, tuple)):
return f"{type(obj).__name__} of size {len(obj)}: {repr(obj)[:200]}"
elif isinstance(obj, dict):
return f"{type(obj).__name__} with {len(obj)} keys: {repr(obj)[:200]}"
elif isinstance(obj, str):
return f"String of length {len(obj)}: {repr(obj)[:200]}"
else:
return repr(obj)[:200]
def show_largest_objects(limit=10):
"""Print the largest objects in memory, in MB."""
gc.collect()
objects = gc.get_objects()
print(f"Total objects in memory: {len(objects)}")
sorted_objects = sorted(objects, key=lambda x: sys.getsizeof(x), reverse=True)[:limit]
for obj in sorted_objects:
size_in_mb = bytes_to_mb(sys.getsizeof(obj))
description = get_object_description(obj)
print(f"{description} - {size_in_mb} MB")
total_memory = list()
total_objects = list()
reg = None
try:
for i in range(100):
gc.collect()
memory_info = process.memory_info()
total_memory.append(bytes_to_mb(memory_info.rss))
total_objects.append(len(gc.get_objects()))
print(f"RSS Memory: {bytes_to_mb(memory_info.rss):.2f} MB")
print(f"VMS Memory: {bytes_to_mb(memory_info.vms):.2f} MB")
# show_largest_objects(5)
reg = ants.registration(template, t1w, aff_iterations=(25, 0, 0, 0), reg_iterations=(1, 0, 0, 0) )
print('Done iteration ' + str(i))
except KeyboardInterrupt:
print("Monitoring stopped.")
print("Final stats")
gc.collect()
memory_info = process.memory_info()
print(f"RSS Memory: {bytes_to_mb(memory_info.rss):.2f} MB")
show_largest_objects(10)
I ran this for 56 iterations before interrupting
> memory_values
[1] 309.633 1901.070 1990.355 1991.445 1993.828 2002.277 2009.391 2009.805 2010.035 2010.977 2011.438 2017.438 2021.508 2022.109 2022.117 2022.137 2022.352 2022.359 2024.039 2024.055 2024.062
[22] 2028.086 2028.105 2028.117 2028.297 2028.297 2030.715 2030.734 2030.738 2030.766 2030.766 2030.773 2030.777 2030.777 2030.777 2030.785 2030.785 2030.797 2030.801 2030.820 2030.836 2030.871
[43] 2030.871 2030.871 2030.875 2030.875 2030.992 2037.000 2037.023 2037.023 2043.039 2043.039 2043.082 2043.082 2043.082 2043.113 2043.121 2043.121 2043.121 2043.152 2043.328
> diff(memory_values)
[1] 1591.437 89.285 1.090 2.383 8.449 7.114 0.414 0.230 0.942 0.461 6.000 4.070 0.601 0.008 0.020 0.215 0.007 1.680 0.016 0.007 4.024
[22] 0.019 0.012 0.180 0.000 2.418 0.019 0.004 0.028 0.000 0.007 0.004 0.000 0.000 0.008 0.000 0.012 0.004 0.019 0.016 0.035 0.000
[43] 0.000 0.004 0.000 0.117 6.008 0.023 0.000 6.016 0.000 0.043 0.000 0.000 0.031 0.008 0.000 0.000 0.031 0.176
>
Because the difference is small (after the first time) and variable, I'm not sure this is actually a leak on the ANTsPy / ITK side.
Happy to investigate further if reproducible examples are available