Conda issues
Closed this issue · 2 comments
Dear Marc and Valentin,
I'm installing pyTME (on the EMBL cluster), and followed your installation guide. I have issues with running template matching. I get a 'permission error' when running the template matching. (Error at the end of the issue).
I tried to install it by following your installation instruction precisely but to make sure I did everything correctly I reinstalled the environment as below:
First I create the conda environment:
module load Miniconda3
conda create --name pytme2 -c conda-forge python=3.11 pyfftw napari magicgui pyqt
Afterwards I install pytme:
source activate pytme2
module load GCC
~/.conda/envs/pytme2/bin/python -m pip install git+https://github.com/KosinskiLab/pyTME.git
I then install napari:
~/.conda/envs/pytme2/bin/python -m pip install napari magicgui pyqt5
~/.conda/envs/pytme2/bin/python -m pip install git+https://github.com/maurerv/napari-density-io.git
And finally installed CuPy:
~/.conda/envs/pytme2/bin/python -m pip install cupy-cuda12x
I run template matching:
module load CUDA/12.2.0
match_template.py -m 075.gaussian.i.mrc -i ref.mrc -n 1 -a 60 --use_gpu
ERROR/LOG
-
pyTME v0.1.3 *
Target
- Inital Shape: (450, 1440, 1022)
- Sampling Rate: (6.52, 6.52, 6.52)
- Final Shape: (450, 1440, 1022)
Template
- Inital Shape: (67, 67, 67)
- Sampling Rate: (6.52, 6.52, 6.52)
- Final Shape: (71, 71, 71)
Template Mask
- Inital Shape: (67, 67, 67)
- Sampling Rate: (6.52, 6.52, 6.52)
- Final Shape: (71, 71, 71)
Template Matching Options
- CPU Cores: 1
- Run on GPU: True [N=1]
- Use Mixed Precision: False
- Assigned Memory [MB]: 12080.0 [out of 14212.0]
- Temporary Directory: /g/scb/mahamid/rasmus/pytme/pyTME/input_data
- Extend Fourier Grid: True
- Extend Target Edges: True
- Interpolation Order: 3
- Score: CC
- Setup Function: <function 'tme.matching_exhaustive.cc_setup'>
- Scoring Function: <function 'tme.matching_exhaustive.corr_scoring'>
- Angular Sampling: 60.0 [24 rotations]
- Scramble Template: False
- Target Splits: 0:2, 1:2, 2:2 [N=8]
Score Analysis Options
- Analyzer: <class 'tme.analyzer.MaxScoreOverRotations'>
- score_threshold: 0.0
- number_of_peaks: 1000
- convolution_mode: valid
- use_memmap: False
Distributing 8 splits on 1 job each using 1 core.
Running Template Matching. This might take a while ...
Process SharedMemoryManager-1:
Traceback (most recent call last):
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/managers.py", line 592, in _run_server
server = cls._Server(registry, address, authkey, serializer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/managers.py", line 1280, in init
Server.init(self, *args, **kwargs)
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/managers.py", line 156, in init
self.listener = Listener(address=address, backlog=16)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/connection.py", line 464, in init
self._listener = SocketListener(address, family, backlog)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/connection.py", line 607, in init
self._socket.bind(address)
PermissionError: [Errno 1] Operation not permitted
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/site-packages/tme/matching_exhaustive.py", line 1192, in inner_function
with SharedMemoryManager() as smh:
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/managers.py", line 645, in enter
self.start()
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/managers.py", line 567, in start
self._address = reader.recv()
^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/connection.py", line 430, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/multiprocessing/connection.py", line 399, in _recv
raise EOFError
Traceback (most recent call last):
File "/home/kjeldsen/.conda/envs/pytme2/bin/match_template.py", line 733, in
main()
File "/home/kjeldsen/.conda/envs/pytme2/bin/match_template.py", line 699, in main
candidates = scan_subsets(
^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/site-packages/tme/matching_exhaustive.py", line 1473, in scan_subsets
results = Parallel(n_jobs=outer_jobs)(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/site-packages/joblib/parallel.py", line 1863, in call
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/site-packages/tme/matching_exhaustive.py", line 47, in _run_inner
return scan(**kwargs)
^^^^^^^^^^^^^^
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/site-packages/tme/matching_exhaustive.py", line 1199, in inner_function
handle_traceback(last_type, last_value, last_traceback)
File "/home/kjeldsen/.conda/envs/pytme2/lib/python3.11/site-packages/tme/matching_utils.py", line 48, in handle_traceback
raise Exception(last_value)
Exception
Unrelated:
You should put a link to the preprint on the github frontpage :-)
Dear Rasmus,
Thank you for reaching out with the details of the issue you're encountering with pyTME. I appreciate your thorough description and have attempted to reproduce the issue on my end without success (see Issue Reproduction below for details).
The PermissionError you're experiencing typically stems from environment or system-level constraints and does not appear to be directly related to the pyTME codebase. However, I have two suggestions that might help resolve this issue:
- Choice of Temporary Directory The /g drive is not ideal as temporary directory because files are created and destroyed frequently, which can interfere with the backup mechanism of the drive. pyTME uses the value of the TMPDIR environment variable as default. SLURM typically sets that for each job, /scratch/jobs/43374609 in my case below. You can either use the default directory SLURM assigns, or you can use the scratch drive as follows:
match_template.py -m 075.gaussian.i.mrc -i ref.mrc -n 1 -a 60 --use_gpu -s CC --temp_directory /scratch/vmaurer/tmp
- System-Level Constraints The behaviour you observe might be specific to a node. Could you provide me with information on the hardware you executed match_template.py on and how it was requested?
Issue Reproduction
Starting from the cupy installation, I requested an interactive job spanning a single node and a single GPU:
srun \
--nodes=1 \
--ntasks=1 \
--cpus-per-task=4 \
--time=48:00:00 \
--qos=highest \
--partition=gpu-el8 \
--constraint gpu=3090 \
--gres=gpu:1 \
--mem=32062 \
--pty bash -i
Once a suitable node was identified, I ran template matching:
module load CUDA/12.2.0
match_template.py -m 075.gaussian.i.mrc -i ref.mrc -n 1 -a 60 --use_gpu -s CC -o /scratch/vmaurer/temp.pickle
Which produced the following output:
-
pyTME v0.1.4 *
Target
- Initial Shape: (450, 1440, 1022)
- Sampling Rate: (6.52, 6.52, 6.52)
- Final Shape: (450, 1440, 1022)
Template
- Initial Shape: (67, 67, 67)
- Sampling Rate: (6.52, 6.52, 6.52)
- Final Shape: (72, 72, 72)
Template Mask
- Initial Shape: (67, 67, 67)
- Sampling Rate: (6.52, 6.52, 6.52)
- Final Shape: (72, 72, 72)
Template Matching Options
- CPU Cores: 1
- Run on GPU: True [N=1]
- Use Mixed Precision: False
- Assigned Memory [MB]: 21371.0 [out of 25142.0]
- Temporary Directory: /scratch/jobs/43374609
- Extend Fourier Grid: True
- Extend Target Edges: True
- Interpolation Order: 3
- Score: CC
- Setup Function: <function 'tme.matching_exhaustive.cc_setup'>
- Scoring Function: <function 'tme.matching_exhaustive.corr_scoring'>
- Angular Sampling: 60.0 [24 rotations]
- Scramble Template: False
- Target Splits: 0:1, 1:2, 2:2 [N=4]
Score Analysis Options
- Analyzer: <class 'tme.analyzer.MaxScoreOverRotations'>
- score_threshold: 0.0
- number_of_peaks: 1000
- convolution_mode: valid
- use_memmap: False
Distributing 4 splits on 1 job each using 1 core.
Running Template Matching. This might take a while ...
Runtime real: 45.837s user: 45.837s
There were no changes between pyTME 0.1.3 and 0.1.4 that would explain the behaviour you observed.
I ran this on our local machine (srv-mahamid-01.embl.de). Here adding the --temp-dir on scratch didn't help, but submitting it to the cluster (with --temp-dir, I did not try without) fixed the problem.
Thank you for looking into it!