ANTsX/ANTsR

antsRegistration crashing with "memory not mapped" error

muratmaga opened this issue · 21 comments

We are trying to process some data from Mouse Phenotyping project. It involves transfering a segmented atlas to new samples. Unprocessed images do not have the correct voxel spacing, so we use an initial lm transform to scale them correctly using five points and then proceed other registration.

For some of the dataset this pipeline works fine, and for some it results at the syn2 with this error. Verbose mode doesn't provide anymore information than this. I know that I do did not resample the syn1 transforms to the higher resolution, but this clearly doesn't cause an issue for some of the samples and for this specific sample I did resample the warp field to high resolution, and still it crashed.

At this point I don't really know how can I debug this further. I appreciate any suggestions, I am providing a link to the zip archive for the sample data that reproduces the problem.

*** caught segfault ***
address 0x2aac22bbc970, cause 'memory not mapped'

Traceback:
1: antsRegistration(fixed = ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = syn1$fwdtransforms, verbose = FALSE)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

if (!require(SlicerMorphR)) devtools::install_github("SlicerMorph/SlicerMorphR") 
library(patchMatchR)
library(ANTsR)

#path to zip archive
setwd("~/Documents/memory_map/Data")

set.seed(5)
ref.lm = read.markups.json("Atlas_lms.mrk.json")
ref.img = antsImageRead("final.nrrd")
ref.img = iMath(ref.img, "Normalize")
ref.img.low = resampleImage(ref.img, c(0.054, 0.054, 0.054))
ref.label = antsImageRead("Embryo_Atlas-labels.nrrd")

tmp.img = antsImageRead('Female_ABCM_K1403-70-e15.5_baseline.nrrd')
tmp.img = iMath(tmp.img, "Normalize")
tmp.img.low = resampleImage(tmp.img, antsGetSpacing(tmp.img)*2)
tmp.lm = read.markups.json('Female_ABCM_K1403-70-e15.5_baseline_lms.mrk.json')
  
lm.tx = fitTransformToPairedPoints(fixedPoints = ref.lm, 
                                     movingPoints = tmp.lm,
                                     transformType = "Similarity", 
                                     domainImage = ref.img)
  
  affine = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "Rigid", initialTransform = lm.tx$transform)
  syn1 = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "SyN", initialTransform = affine$fwdtransforms)
  
  #at this point resultant registration seems reasonable:
  
  new.labels = antsApplyTransforms(fixed=tmp.img, moving = ref.label, 
                            transformlist = syn1$invtransforms, 
                            interpolator = "genericLabel")
  
  plot(tmp.img, new.labels, nslices=20, alpha=0.8)
                            
  #crashes here                          
  syn2 = antsRegistration(fixed=ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = syn1$fwdtransforms, verbose=TRUE)

edit. This is the sessionInfo(), if it is useful:

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] jsonlite_1.7.2 patchMatchR_1.0 ANTsR_0.5.7.5 ANTsRCore_0.7.5 SlicerMorphR_0.0.0.9000

loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 whisker_0.4 magrittr_1.5 docopt_0.6.1 lattice_0.20-38 R6_2.4.0 tools_3.6.0
[8] grid_3.6.0 keras_2.2.5.0 tfruns_1.4 abind_1.4-5 RcppEigen_0.3.3.5.0 tensorflow_2.0.0 Matrix_1.2-17
[15] base64enc_0.1-3 ANTsRNet_1.1 zeallot_0.1.0 slam_0.1-46 qlcMatrix_0.9.7 compiler_3.6.0 ITKR_0.5.3.3
[22] generics_0.0.2 sparsesvd_0.2 reticulate_1.13 mvtnorm_1.0-11

This is on a physical Centos7 machine. RAM does not come close to running out (there are 256GB available on these nodes). When completes, R uses about 15-18GB tops for these registration operations.

Ehm, not sure what is going on. Let's see what others say. I have a system very similar to yours and could try your commands later if needed.

When you have the chance, that will be great. It seems to replicate for me on different systems.

I found a system with R 4.0.5 with similar versions, and can't seem to replicate. So I will use the newer R.

Spoke too soon. Same error on 4.0.5 Open to any other suggestions.

have you run this same thing single threaded?

I can try but will take very long.
It is through
Sys.setenv("ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1")
right?

yes

at the very beginning of the script - or in the envt

Single threaded crashed on R 3.6.0 with the same error, now trying 4.0.5.

This is still running (it has been about 12hs). Normally at the syn2, if it crashes, it does it so early on. With single threaded, it is hard to tell where we are. Assuming this will work, what else can be done? Single-threaded is too slow the samples we have, will probably a day (or possibly longer) to process a single sample.

Confirmed that single threaded version finished successfully after 4 days.

In the multi-threaded context, how many threads is it trying to use? In other words, what happens if you print

Sys.getenv("ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS")

These are private compute nodes, so it is using maximum number of threads available on the node. Depending on the node it is running, this varies from 24 to 48.

OK, that is a lot of threads, it might be hitting a system memory limit either on the machine itself (ulimit -v) or by a job scheduler.

I would try running with 8 threads. If that works consistently you could try increasing to 16. I often don't see much improvement beyond 8, but that might be data dependent.

We are not using any scheduler itself right now. We manually (and directly) run the scripts on the compute node(s). SO we can rule out the job scheduler.

I can try with 8 threads. But for clarification, we have been doing this (running maximum number of threads in ANTsR) for a while for different datasets. This is the first time we encountered this memory not mapped issue.

Thanks for the feedback. We already know the number of threads and cores on the compute nodes, which is set by the environmental variable for each compute node through:

export ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=cat /proc/cpuinfo | grep processor | wc -l

We have three types of compute nodes with 12, 20 and 24 physical cores and hyperthreading is enabled on all of them (so the reported threads are twice as many of the physical cores). Hence the reason for setting the variable like this.

We don't use scheduler (for this particular case), because it is a private system. Nobody else executes anything on them (apart from baselien OS processes). We have used antsMultivariateTemplateConstruction2.sh script previously on this exact system (which runs sge), with similar type of datasets, while utilizing all cores/threads on the system for a single registration. The reason we do that is because our microCT datasets are one or two order of magnitude larger than typical medical volumes. Thus, if you run it on 1-2 cores, it takes days to finish a single dataset (example above took 4 days, or 92h to be precise). Plus due to large memory consumption of these datasets at later stages of registration, if you go down few cores, but multiple simultaneous tasks per node, you are bound to run of out memory on the node, crashing all your tasks.

So far, one registration task per node that utilizes all threads paradigm worked well for us, until this particular dataset.

One more reflection:

For the jobs that do indeed complete using the all threads, this registration, takes about 4h (volume sizes are quite similar). Compared to the single threaded version, speed up is about 23 X, which is almost the number of physical cores. I will try either disabling the hyperthreading or just running as many threads as there are cores.

I wasn't able to debug any further, but the two level registration is the cause of the problem. This memory not mapped does not happen at all, regardless of the number of threads of resource. So, instead of doing:

 affine = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "Rigid", initialTransform = lm.tx$transform)
  syn1 = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "SyN", initialTransform = affine$fwdtransforms)
  
  #at this point resultant registration seems reasonable:
  
  new.labels = antsApplyTransforms(fixed=tmp.img, moving = ref.label, 
                            transformlist = syn1$invtransforms, 
                            interpolator = "genericLabel")
  
  plot(tmp.img, new.labels, nslices=20, alpha=0.8)
                            
  #crashes here                          
  syn2 = antsRegistration(fixed=ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = syn1$fwdtransforms, verbose=TRUE)

I go straight to syn2 from affine, crash doesn't seem to happen.

 affine = antsRegistration(fixed=ref.img.low, moving = tmp.img.low, typeofTransform = "Rigid", initialTransform = lm.tx$transform)
  syn2 = antsRegistration(fixed=ref.img, moving = tmp.img, typeofTransform = "SyN", initialTransform = affine$fwdtransforms, verbose=TRUE)

We can live with this, but our test data showed that we got better results in terms of label overlap, and landmark localization with the two step approach. So, if anyone else have other ideas, would be happy to hear