CERN/TIGRE

Running iterative alg stuck in Ubuntu system

Opened this issue · 13 comments

Hi, I run your code smoothly on Windows, when I transfer to linux, after compile, it could run the forward and backprojection on my data. But every time When I run OSART-TV like below, it will stuck with no response. In windows it give me response within few seconds.
algs.ossart_tv(proj, self.geo, angles, niter=1, init = init)
Thanks for your help

Specifications

  • python version:3.10
  • OS:Linux
  • CUDA version:12.2
    conda list
    image

There are a couple of rare issues that may be causing this, but its been hard to debug because I can't reproduce it.

One thing to try: in the following function, a new geoemtry is created from the input one.

geox.sVoxel[1:] = geox.sVoxel[1:] * 1.1 # a bit larger to avoid zeros in projections

Can you try changing the code locally so it doesn't do this modification of the geoemtry? Just the copy.

Do you mean comment this line right? I tried and failed. But I tried some Krylov subspace algorithms like CGLS and LSQR it worked, That is weired. But the OSART-TV's performence is the best...

@stefenmax not just that line, but the few after.
Apologies I am in a trip so can't help much, but the idea is to pass an un modified geo to Atb

Thanks for you help. But it still didn't works. Maybe I should run it using windows. And I found that the speed is faster than linux lol

hum... I don't really know then why.
As I can not reproduce I would need to know which function hangs, is there any way you can try to figure that out?
I have extensively used TIGRE in Linux, so its certainly a specific case of geometry, CUDA, number of GPUS, OS, python version or something like that that causes this strange error, but its hard for me to figure out simply because I don't see it.

I'll keep the issue open, if you do happen to pinpoint what exactly hangs (has to be some Ax() or Atb() call somewhere) do let me know. I do suspect its set_w or set_v that hang...

I found that I can run the ossart algogrithm in the example.py in my linux system. So I tried replace my geometry using the head phantom and found it hangg in the tigre.Ax. That is weired cause previously I could do the Ax and FDK for my own data. Here is the example code, I don't know if you can reproduce this.

from __future__ import division
from __future__ import print_function

import numpy as np
import tigre
import tigre.algorithms as algs
from tigre.utilities import sample_loader
from tigre.utilities.Measure_Quality import Measure_Quality
import tigre.utilities.gpu as gpu
import matplotlib.pyplot as plt
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
### This is just a basic example of very few TIGRE functionallity.
# We hihgly recomend checking the Demos folder, where most if not all features of tigre are demoed.

listGpuNames = gpu.getGpuNames()
if len(listGpuNames) == 0:
    print("Error: No gpu found")
else:
    for id in range(len(listGpuNames)):
        print("{}: {}".format(id, listGpuNames[id]))

gpuids = gpu.getGpuIds(listGpuNames[0])
print(gpuids)

# Geometry
# geo1 = tigre.geometry(mode='cone', high_resolution=False, default=True)
img_size = 256
geo = tigre.geometry(mode="cone")
geo.DSD = 950
geo.DSO = 540
geo.nDetector = np.array([1, 835]) 
geo.dDetector = np.array([1, 0.9643345*950 / 835])
geo.sDetector = geo.dDetector * geo.nDetector
geo.nVoxel = np.array([1, img_size, img_size])
geo.sVoxel = geo.nVoxel
geo.dVoxel = geo.sVoxel / geo.nVoxel 
geo.accuracy=0.5  
angles = np.linspace(0, np.pi/2, 180, dtype=np.float32)
# Prepare projection data
head = sample_loader.load_head_phantom(geo.nVoxel)
breakpoint()
proj = tigre.Ax(head, geo, angles, gpuids=gpuids)
test = tigre.Atb(proj,geo,angles,backprojection_type="matched",gpuids=gpuids)
# Reconstruct
niter = 20
fdkout = algs.fdk(proj, geo, angles, gpuids=gpuids)
breakpoint()
ossart = algs.ossart(proj, geo, angles, niter, blocksize=20, gpuids=gpuids)

# Measure Quality
# 'RMSE', 'MSSIM', 'SSD', 'UQI'
print("RMSE fdk:")
print(Measure_Quality(fdkout, head, ["nRMSE"]))
print("RMSE ossart")
print(Measure_Quality(ossart, head, ["nRMSE"]))

# Plot
fig, axes = plt.subplots(3, 2)
axes[0, 0].set_title("FDK")
axes[0, 0].imshow(fdkout[geo.nVoxel[0] // 2])
axes[1, 0].imshow(fdkout[:, geo.nVoxel[1] // 2, :])
axes[2, 0].imshow(fdkout[:, :, geo.nVoxel[2] // 2])
axes[0, 1].set_title("OS-SART")
axes[0, 1].imshow(ossart[geo.nVoxel[0] // 2])
axes[1, 1].imshow(ossart[:, geo.nVoxel[1] // 2, :])
axes[2, 1].imshow(ossart[:, :, geo.nVoxel[2] // 2])
plt.show()
# tigre.plotProj(proj)
# tigre.plotImg(fdkout)


So it hangs in the Ax in this code?
What if you make a different amount of GPUs visible? Are they all the same GPU?

yeah, it hangs in the Ax.
No it was not the same GPU. But in my another server, there are two same GPU. And it hangs in the same position.
image

Certainly with different GPUs behaviour is undefined, so that would be an issue.

I'll try your specific geometry. But out of curiosity, if you change the nvoxel/ndetector a bit, does it still hang?

Do you have any recommendation on how to change the nvoxel/ndetector?

Just give it a different value, just to see if its the specific values causing the issue.

Yes,after change it a bit. Still hang

Apologies, I don't seem to be able to reproduce this in any way. If you can pinpoint where the error is, do let me know.