mdt-model-fit hung after logging "Starting post-processing"

Question

mdt-model-fit hung after logging "Starting post-processing"

mastrogiovanni opened this issue 5 years ago · 3 comments

We are running mdt-model-fit:

mdt-model-fit \
  --cl-device-ind 0 -o "output1/powell_double" \
  -n "1" \
  --double\
  --no-recalculate\
  "NODDI" \
  "data.nii" \
  "bvecs.prtcl" \
  "nodif_brain_mask.nii"

the output of the command hang forever after following output:

--------------------------------------------------------------
-------------------------Watson double ------------------------
--------------------------------------------------------------
[2020-03-07 16:20:02,631] [INFO] [mdt.lib.processing.model_fitting] [get_model_fit] - Starting intermediate optimization for generating initialization point.
[2020-03-07 16:20:02,778] [INFO] [mdt.lib.processing.model_fitting] [fit_composite_model] - Using MDT version 1.2.2
[2020-03-07 16:20:02,779] [INFO] [mdt.lib.processing.model_fitting] [fit_composite_model] - Preparing for model BallStick_r1
[2020-03-07 16:20:02,995] [INFO] [mdt.models.composite] [_prepare_input_data] - No volume options to apply, using all 104 volumes.
[2020-03-07 16:20:02,996] [INFO] [mdt.lib.processing.model_fitting] [_model_fit_logging] - Fitting BallStick_r1 model
[2020-03-07 16:20:02,996] [INFO] [mdt.lib.processing.model_fitting] [_model_fit_logging] - The 4 parameters we will fit are: ['S0.s0', 'w_stick0.w', 'Stick0.theta', 'Stick0.phi']
[2020-03-07 16:20:02,997] [INFO] [mdt.lib.processing.model_fitting] [fit_composite_model] - Saving temporary results in output1/powell_double/BallStick_r1/tmp_results.
[2020-03-07 16:20:03,286] [INFO] [mdt.lib.processing.processing_strategies] [_process_chunk] - Computations are at 0.00%, processing next 100000 voxels (149524 voxels in total, 0 processed). Time spent: 0:00:00:00, time left: ? (d:h:m:s).
[2020-03-07 16:20:03,287] [INFO] [mdt.lib.processing.model_fitting] [_process] - Starting optimization
[2020-03-07 16:20:03,288] [INFO] [mdt.lib.processing.model_fitting] [_process] - Using MOT version 0.11.1
[2020-03-07 16:20:03,288] [INFO] [mdt.lib.processing.model_fitting] [_process] - We will use a double precision float type for the calculations.
[2020-03-07 16:20:03,289] [INFO] [mdt.lib.processing.model_fitting] [_process] - Using device 'GPU - TITAN RTX (NVIDIA CUDA)'.
[2020-03-07 16:20:03,289] [INFO] [mdt.lib.processing.model_fitting] [_process] - Using compile flags: ('-cl-denorms-are-zero', '-cl-mad-enable', '-cl-no-signed-zeros')
[2020-03-07 16:20:03,289] [INFO] [mdt.lib.processing.model_fitting] [_process] - We will use the optimizer Powell with default settings.
[2020-03-07 16:20:17,327] [INFO] [mdt.lib.processing.model_fitting] [_process] - Finished optimization
[2020-03-07 16:20:17,328] [INFO] [mdt.lib.processing.model_fitting] [_process] - Starting post-processing

The command is launched from inside the docker container created using the pull request #20

Answer 1 · 2020-03-09T10:26:26.000Z

Hi Giovanni,

This is a common error and it depends a bit on the GPU and the driver version. As you noticed in your pull request, the problem is with the computing the Hessian matrix, used for estimating the variances and covariances.

The Hessian computation matrix is quite a complex kernel and sometimes fails to compile or run. This is indeed a problem and I have plans for rewriting that function into more separate compute kernels. This will require some dedicated time however, which is limited.

As a workaround, you could disable the computation of the Hessian using a configuration file. Create a configuration file named mdt.conf and place it in "~/.mdt/<latest_version>/" with the following content:

active_post_processing:
    optimization:
        # If set, we compute the uncertainties
        uncertainties: False

Let me know if this works for you.

Best,

Robbert

Answer 2 · 2020-03-09T19:22:47.000Z

The modification you proposed solved the issue: thank you!
My only suggestion is related to documentation: when you use Docker is mandatory to mount the https://github.com/robbert-harms/MDT/tree/master/mdt/data directory in the ~/.mdt/1.2.2 inside container in order to work.
You can close this issue.

Answer 3 · 2020-03-16T09:12:36.000Z

Hi Mastrogiovanni,

Thank you for the tip. I added it to the documentation.

Best wishes,

Robbert