ANTsX/ANTsPy

ANTsPy crashes Python when registering two images

JescoS opened this issue · 4 comments

Describe the bug

Python crashes without any error when calling ants.registration to rigidly register one MRI image to another. This only occurs with two specific images in my dataset.

To reproduce

import ants

fixed = ants.image_read(".../t1.nii.gz")
moving = ants.image_read(".../t1_gd.nii.gz")

# It crashes here when registering 't1_gd' (moving image) to 't1' (fixed image)
reg = ants.registration(
    fixed,
    moving,
    "Rigid",
    verbose=True,
)

Expected behavior

The registration to happen successfully, or at least that a helpful error message is provided.

Screenshots

Output from the code provided under To reproduce
antsRegistration -d 3 -r [0x55be9ed2c2d0,0x55be9ed36050,1] -m mattes[0x55be9ed2c2d0,0x55be9ed36050,1,32,regular,0.2] -t Rigid[0.25] -c 2100x1200x1200x10 -s 3x2x1x0 -f 6x4x2x1 -u 1 -z 1 -o [/tmp/tmpevgyaz8y,0x55be9ecbb2c0,0x55be9f006d20] -x [NA,NA] --float 1 --write-composite-transform 0 -v 1
All_Command_lines_OK
Using single precision for computations.
=============================================================================
The composite transform comprises the following transforms (in order): 
  1. Center of mass alignment using fixed image: 0x55be9ed2c2d0 and moving image: 0x55be9ed36050 (type = Euler3DTransform)
=============================================================================
  Reading mask(s).
    Registration stage 0
      No fixed mask
      No moving mask
  number of levels = 4
  fixed image: 0x55be9ed2c2d0
  moving image: 0x55be9ed36050
Dimension = 3
Number of stages = 1
Use histogram matching = true
Winsorize image intensities = false
  Lower quantile = 0
  Upper quantile = 1


Stage 1 State
   Image metric = MattesMI
     Fixed image = Image (0x55be9f0b6ff0)
  RTTI typeinfo:   itk::Image<float, 3u>
  Reference Count: 2
  Modified Time: 917
  Debug: Off
  Object Name: 
  Observers: 
    none
  Source: (none)
  Source output name: (none)
  Release Data: Off
  Data Released: False
  Global Release Data: Off
  PipelineMTime: 0
  UpdateMTime: 889
  RealTimeStamp: 0 seconds 
  LargestPossibleRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [1024, 1024, 175]
  BufferedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [1024, 1024, 175]
  RequestedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [1024, 1024, 175]
  Spacing: [0.234375, 0.234375, 1]
  Origin: [117.71, 77.0839, 24.583]
  Direction: 
-0.999791 -0.0203991 -0.000987429
0.0193326 -0.929714 -0.367774
0.00658424 -0.367716 0.929915

  IndexToPointMatrix: 
-0.234326 -0.00478105 -0.000987429
0.00453107 -0.217902 -0.367774
0.00154318 -0.0861835 0.929915

  PointToIndexMatrix: 
-4.26578 0.0824856 0.0280928
-0.0870363 -3.96678 -1.56892
-0.000987429 -0.367774 0.929915

  Inverse Direction: 
-0.999791 0.0193326 0.00658424
-0.0203991 -0.929714 -0.367716
-0.000987429 -0.367774 0.929915

  PixelContainer: 
    ImportImageContainer (0x55be88baaf30)
      RTTI typeinfo:   itk::ImportImageContainer<unsigned long, float>
      Reference Count: 1
      Modified Time: 887
      Debug: Off
      Object Name: 
      Observers: 
        none
      Pointer: 0x7f3e5a0ac010
      Container manages memory: true
      Size: 183500800
      Capacity: 183500800

     Moving image = Image (0x55be9f0b9590)
  RTTI typeinfo:   itk::Image<float, 3u>
  Reference Count: 2
  Modified Time: 918
  Debug: Off
  Object Name: 
  Observers: 
    none
  Source: (none)
  Source output name: (none)
  Release Data: Off
  Data Released: False
  Global Release Data: Off
  PipelineMTime: 0
  UpdateMTime: 915
  RealTimeStamp: 0 seconds 
  LargestPossibleRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [1024, 1024, 55]
  BufferedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [1024, 1024, 55]
  RequestedRegion: 
    Dimension: 3
    Index: [0, 0, 0]
    Size: [1024, 1024, 55]
  Spacing: [0.234375, 0.234375, 3]
  Origin: [119.193, 75.9313, 31.5048]
  Direction: 
-0.999759 -0.0203991 -0.00813477
0.0219612 -0.929714 -0.367626
-6.3751e-05 -0.367716 0.929938

  IndexToPointMatrix: 
-0.234318 -0.00478105 -0.0244043
0.00514716 -0.217902 -1.10288
-1.49416e-05 -0.0861835 2.78981

  PointToIndexMatrix: 
-4.26564 0.0937012 -0.000272009
-0.0870363 -3.96678 -1.56892
-0.00271159 -0.122542 0.309979

  Inverse Direction: 
-0.999759 0.0219612 -6.37521e-05
-0.0203991 -0.929714 -0.367716
-0.00813477 -0.367626 0.929938

  PixelContainer: 
    ImportImageContainer (0x55be9f0b9820)
      RTTI typeinfo:   itk::ImportImageContainer<unsigned long, float>
      Reference Count: 1
      Modified Time: 913
      Debug: Off
      Object Name: 
      Observers: 
        none
      Pointer: 0x7f3e0e3ff010
      Container manages memory: true
      Size: 57671680
      Capacity: 57671680

     Weighting = 1
     Sampling strategy = regular
     Number of bins = 32
     Radius = 4
     Sampling percentage  = 0.2
   Transform = Rigid
     Gradient step = 0.25
     Update field sigma (voxel space) = 0
     Total field sigma (voxel space) = 0
     Update field time sigma = 0
     Total field time sigma  = 0
     Number of time indices = 0
     Number of time point samples = 0
Registration using 1 total stages.

Stage 0
  iterations = 2100x1200x1200x10
  convergence threshold = 1e-06
  convergence window size = 10
  number of levels = 4
  using the Mattes MI metric (number of bins = 32, weight = 1, use gradient filter = 0)
  preprocessing:  histogram matching the images
  Shrink factors (level 1 out of 4): [6, 6, 1]
  Shrink factors (level 2 out of 4): [4, 4, 1]
  Shrink factors (level 3 out of 4): [2, 2, 1]
  Shrink factors (level 4 out of 4): [1, 1, 1]
  smoothing sigmas per level: [3, 2, 1, 0]
  regular sampling (percentage = 0.2)

*** Running Euler3DTransform registration ***

DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
 2DIAGNOSTIC,     1, -7.784408330917e-01, inf, 3.3159e+00, 3.3159e+00, 
 2DIAGNOSTIC,     2, -7.880971431732e-01, inf, 3.6818e+00, 3.6594e-01, 
 2DIAGNOSTIC,     3, -7.961868643761e-01, inf, 4.0615e+00, 3.7965e-01, 
 2DIAGNOSTIC,     4, -7.995039820671e-01, inf, 4.4379e+00, 3.7644e-01, 
 2DIAGNOSTIC,     5, -8.014418482780e-01, inf, 4.8091e+00, 3.7121e-01, 
 2DIAGNOSTIC,     6, -8.084483146667e-01, inf, 5.2293e+00, 4.2015e-01, 
 2DIAGNOSTIC,     7, -8.104047179222e-01, inf, 5.7435e+00, 5.1423e-01, 
 2DIAGNOSTIC,     8, -8.110386133194e-01, inf, 6.1126e+00, 3.6908e-01, 
 2DIAGNOSTIC,     9, -8.157773613930e-01, inf, 6.5345e+00, 4.2192e-01, 
 2DIAGNOSTIC,    10, -8.202537894249e-01, 2.929246518761e-03, 6.9102e+00, 3.7568e-01, 
 2DIAGNOSTIC,    11, -8.206747770309e-01, 2.245417330414e-03, 7.3237e+00, 4.1351e-01, 
 2DIAGNOSTIC,    12, -8.211225867271e-01, 1.770743401721e-03, 7.8365e+00, 5.1281e-01, 
 2DIAGNOSTIC,    13, -8.212584853172e-01, 1.435689860955e-03, 8.2116e+00, 3.7508e-01, 
 2DIAGNOSTIC,    14, -8.218446373940e-01, 1.116505824029e-03, 8.6343e+00, 4.2277e-01, 
 2DIAGNOSTIC,    15, -8.223311305046e-01, 7.893703295849e-04, 9.1028e+00, 4.6844e-01, 
 2DIAGNOSTIC,    16, -8.223287463188e-01, 5.852135363966e-04, 9.5163e+00, 4.1355e-01, 
 2DIAGNOSTIC,    17, -8.223301768303e-01, 3.945839998778e-04, 9.8815e+00, 3.6512e-01, 
 2DIAGNOSTIC,    18, -8.223319649696e-01, 2.001180691877e-04, 1.0310e+01, 4.2856e-01, 
 2DIAGNOSTIC,    19, -8.223304152489e-01, 9.293391485699e-05, 1.0684e+01, 3.7397e-01, 
 2DIAGNOSTIC,    20, -8.223320841789e-01, 6.619643681915e-05, 1.1105e+01, 4.2054e-01, 
 2DIAGNOSTIC,    21, -8.223316073418e-01, 4.296490442357e-05, 1.1524e+01, 4.1926e-01, 
 2DIAGNOSTIC,    22, -8.223342299461e-01, 2.533297993068e-05, 1.1953e+01, 4.2919e-01, 
 2DIAGNOSTIC,    23, -8.223354816437e-01, 8.932824130170e-06, 1.2375e+01, 4.2202e-01, 
 2DIAGNOSTIC,    24, -8.223351836205e-01, 1.526392907181e-06, 1.2798e+01, 4.2326e-01, 
 2DIAGNOSTIC,    25, -8.223311305046e-01, 1.437342575628e-06, 1.3218e+01, 4.1943e-01, 
 2DIAGNOSTIC,    26, -8.223313093185e-01, 1.312344238613e-06, 1.3843e+01, 6.2565e-01, 
 2DIAGNOSTIC,    27, -8.223315477371e-01, 1.209031893268e-06, 1.4270e+01, 4.2643e-01, 
 2DIAGNOSTIC,    28, -8.223308920860e-01, 1.124806317421e-06, 1.4794e+01, 5.2415e-01, 
 2DIAGNOSTIC,    29, -8.223308324814e-01, 1.029444206324e-06, 1.5218e+01, 4.2359e-01, 
DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST
 2DIAGNOSTIC,     1, -7.919459939003e-01, inf, 2.1032e+01, 5.8147e+00, 
 2DIAGNOSTIC,     2, -7.922274470329e-01, inf, 2.1988e+01, 9.5568e-01, 
 2DIAGNOSTIC,     3, -7.923026680946e-01, inf, 2.3133e+01, 1.1451e+00, 
 2DIAGNOSTIC,     4, -7.924258708954e-01, inf, 2.4152e+01, 1.0188e+00, 
 2DIAGNOSTIC,     5, -7.924597263336e-01, inf, 2.5251e+01, 1.0996e+00, 
 2DIAGNOSTIC,     6, -7.925294637680e-01, inf, 2.6502e+01, 1.2502e+00, 
 2DIAGNOSTIC,     7, -7.925676107407e-01, inf, 2.7294e+01, 7.9280e-01, 
 2DIAGNOSTIC,     8, -7.925767302513e-01, inf, 2.8252e+01, 9.5760e-01, 
 2DIAGNOSTIC,     9, -7.925794720650e-01, inf, 2.9275e+01, 1.0227e+00, 
 2DIAGNOSTIC,    10, -7.925798296928e-01, 4.663267100113e-05, 3.0165e+01, 8.9031e-01, 
 2DIAGNOSTIC,    11, -7.925813794136e-01, 2.782916271826e-05, 3.1149e+01, 9.8371e-01, 
 2DIAGNOSTIC,    12, -7.925839424133e-01, 1.838119169406e-05, 3.2076e+01, 9.2773e-01, 
 2DIAGNOSTIC,    13, -7.925873398781e-01, 1.107098978537e-05, 3.3009e+01, 9.3283e-01, 
 2DIAGNOSTIC,    14, -7.925903797150e-01, 7.265426120284e-06, 3.3815e+01, 8.0576e-01, 
 2DIAGNOSTIC,    15, -7.925956845284e-01, 4.413903752720e-06, 3.4765e+01, 9.5040e-01, 
 2DIAGNOSTIC,    16, -7.925977110863e-01, 3.293936742921e-06, 3.5674e+01, 9.0855e-01, 
 2DIAGNOSTIC,    17, -7.925973534584e-01, 2.977928033943e-06, 3.6440e+01, 7.6642e-01, 
 2DIAGNOSTIC,    18, -7.925977706909e-01, 2.784697244351e-06, 3.7330e+01, 8.8950e-01, 
 2DIAGNOSTIC,    19, -7.925984859467e-01, 2.582474280644e-06, 3.8237e+01, 9.0673e-01, 
 2DIAGNOSTIC,    20, -7.926011681557e-01, 2.369023832216e-06, 3.9432e+01, 1.1949e+00, 
 2DIAGNOSTIC,    21, -7.926015257835e-01, 2.134311671398e-06, 4.0366e+01, 9.3464e-01, 
 2DIAGNOSTIC,    22, -7.926001548767e-01, 1.883459503915e-06, 4.1423e+01, 1.0564e+00, 
 2DIAGNOSTIC,    23, -7.925997972488e-01, 1.662279260017e-06, 4.2299e+01, 8.7658e-01, 
 2DIAGNOSTIC,    24, -7.925995588303e-01, 1.472873009334e-06, 4.3217e+01, 9.1793e-01, 
 2DIAGNOSTIC,    25, -7.926007509232e-01, 1.376191448799e-06, 4.4349e+01, 1.1316e+00, 
 2DIAGNOSTIC,    26, -7.926012277603e-01, 1.311355731559e-06, 4.5566e+01, 1.2175e+00, 
 2DIAGNOSTIC,    27, -7.926009297371e-01, 1.235148147316e-06, 4.7030e+01, 1.4635e+00, 
 2DIAGNOSTIC,    28, -7.926003932953e-01, 1.158303689408e-06, 4.8101e+01, 1.0711e+00, 
 2DIAGNOSTIC,    29, -7.926004528999e-01, 1.093328023671e-06, 4.9094e+01, 9.9350e-01, 
 2DIAGNOSTIC,    30, -7.926002740860e-01, 1.062158162313e-06, 5.0157e+01, 1.0629e+00, 
 2DIAGNOSTIC,    31, -7.926017642021e-01, 1.054402559930e-06, 5.1191e+01, 1.0335e+00, 
 2DIAGNOSTIC,    32, -7.926021218300e-01, 1.032973159454e-06, 5.2104e+01, 9.1302e-01, 
 2DIAGNOSTIC,    33, -7.926020622253e-01, 1.005100102702e-06, 5.3521e+01, 1.4168e+00, 
Killed

ANTsPy installation (please complete the following information):

  • Hardware [ PC ]
  • OS: [ Windows, Ubuntu ]
  • System details [ Linux 5.15.146.1-microsoft-standard-WSL2 x86_64 ]
  • Sub-system: [ WSL, Docker ]
  • ANTsPy version: [ 0.5.2 ]
  • Installation type: [ downloaded wheel from https://github.com/ANTsX/ANTsPy/releases/download/v0.5.2/antspyx-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl ]

Additional context
The registration only seems to fail for a particular combination of images in my dataset, so I cannot reproduce the error with any publicly available data, unfortunately. It also seems to fail after a different number of iterations every time I run the provided code.

Out of memory? Try increasing RAM available to docker?

Are the other images that work successfully as large as these? 1024x1024x175 is quite large.

It also seems to fail after a different number of iterations every time I run the provided code.

I suspect it is converging at different numbers of iterations, then running out of memory. Before each stage, it will print

DIAGNOSTIC,Iteration,metricValue,convergenceValue,ITERATION_TIME_INDEX,SINCE_LAST

If this is consistently printed twice, it's probably that it's converging differently due to random sampling and then trying and failing to start the next level.

I can see a small increase in the number of objects and the RSS memory over time, when running registration in a loop. Just registering a normal T1w image to MNI, repeatedly

import ants
import gc
import psutil
import sys
import time

t1w = ants.image_read('t1w.nii.gz')
template = ants.image_read('tpl-MNI152NLin2009cAsym_res-01_T1w.nii.gz')

process = psutil.Process()

# Function to convert bytes to MB
def bytes_to_mb(bytes):
    return round(bytes / 1024 / 1024, 3)


def get_object_description(obj):
    """Generate a description of the object."""
    if isinstance(obj, (list, set, tuple)):
        return f"{type(obj).__name__} of size {len(obj)}: {repr(obj)[:200]}"
    elif isinstance(obj, dict):
        return f"{type(obj).__name__} with {len(obj)} keys: {repr(obj)[:200]}"
    elif isinstance(obj, str):
        return f"String of length {len(obj)}: {repr(obj)[:200]}"
    else:
        return repr(obj)[:200]

def show_largest_objects(limit=10):
    """Print the largest objects in memory, in MB."""
    gc.collect()
    objects = gc.get_objects()
    print(f"Total objects in memory: {len(objects)}")
    sorted_objects = sorted(objects, key=lambda x: sys.getsizeof(x), reverse=True)[:limit]
    for obj in sorted_objects:
        size_in_mb = bytes_to_mb(sys.getsizeof(obj))
        description = get_object_description(obj)
        print(f"{description} - {size_in_mb} MB")

total_memory = list()
total_objects = list()

reg = None

try:
    for i in range(100):
        gc.collect()
        memory_info = process.memory_info()
        total_memory.append(bytes_to_mb(memory_info.rss))
        total_objects.append(len(gc.get_objects()))
        print(f"RSS Memory: {bytes_to_mb(memory_info.rss):.2f} MB")
        print(f"VMS Memory: {bytes_to_mb(memory_info.vms):.2f} MB")
        # show_largest_objects(5)
        reg = ants.registration(template, t1w, aff_iterations=(25, 0, 0, 0), reg_iterations=(1, 0, 0, 0) )
        print('Done iteration ' + str(i))
except KeyboardInterrupt:
    print("Monitoring stopped.")

print("Final stats")
gc.collect()
memory_info = process.memory_info()
print(f"RSS Memory: {bytes_to_mb(memory_info.rss):.2f} MB")
show_largest_objects(10)

I ran this for 56 iterations before interrupting

image
> memory_values
 [1]  309.633 1901.070 1990.355 1991.445 1993.828 2002.277 2009.391 2009.805 2010.035 2010.977 2011.438 2017.438 2021.508 2022.109 2022.117 2022.137 2022.352 2022.359 2024.039 2024.055 2024.062
[22] 2028.086 2028.105 2028.117 2028.297 2028.297 2030.715 2030.734 2030.738 2030.766 2030.766 2030.773 2030.777 2030.777 2030.777 2030.785 2030.785 2030.797 2030.801 2030.820 2030.836 2030.871
[43] 2030.871 2030.871 2030.875 2030.875 2030.992 2037.000 2037.023 2037.023 2043.039 2043.039 2043.082 2043.082 2043.082 2043.113 2043.121 2043.121 2043.121 2043.152 2043.328
> diff(memory_values)
 [1] 1591.437   89.285    1.090    2.383    8.449    7.114    0.414    0.230    0.942    0.461    6.000    4.070    0.601    0.008    0.020    0.215    0.007    1.680    0.016    0.007    4.024
[22]    0.019    0.012    0.180    0.000    2.418    0.019    0.004    0.028    0.000    0.007    0.004    0.000    0.000    0.008    0.000    0.012    0.004    0.019    0.016    0.035    0.000
[43]    0.000    0.004    0.000    0.117    6.008    0.023    0.000    6.016    0.000    0.043    0.000    0.000    0.031    0.008    0.000    0.000    0.031    0.176
> 

Because the difference is small (after the first time) and variable, I'm not sure this is actually a leak on the ANTsPy / ITK side.

Happy to investigate further if reproducible examples are available