hzi-bifo/RiboDetector

CPU mode hangs with macOS

Closed this issue · 6 comments

omicz commented

Hello,
I'm attempting to run ribodetector_cpu in macOS (i.e. no CUDA support) on a publicly available dataset. Python 3.8.13 ribodetector 0.2.6 on macOS 12.1. I followed the instructions for setting up the ribodetector conda environment plus dependancies for the cpu-only mode:

conda create -n ribodetector python=3.8
conda activate ribodetector
mamba install -c bioconda ribodetector
conda install pytorch torchvision torchaudio cpuonly -c pytorch

My code is as follows:

ribodetector_cpu -t 16 \
  -l 150 \
  -i fastq/SRR15852393_1.fastq.gz \
     fastq/SRR15852393_2.fastq.gz \
  -e rrna \
  --chunk_size 256 \
  -o out/fastq_orig_ribodetector/SRR15852393_1.ribodetector.fastq.gz \
     out/fastq_orig_ribodetector/SRR15852393_2.ribodetector.fastq.gz 

and I receive the following error:

2022-06-22 16:42:44 : INFO  Using high MCC model file: /opt/anaconda3/envs/ribodetector/lib/python3.8/site-packages/ribodetector/data/ribodetector_600k_variable_len70_101_epoch47.onnx on CPU
2022-06-22 16:42:44 : INFO  Classify reads with chunk size 256
2022-06-22 16:42:44 : INFO  Writing output non-rRNA sequences into file: out/fastq_orig_ribodetector/SRR15852393_1.ribodetector.fastq.gz, out/fastq_orig_ribodetector/SRR15852393_2.ribodetector.fastq.gz
Traceback (most recent call last):
  File "/opt/anaconda3/envs/ribodetector/bin/ribodetector_cpu", line 10, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/site-packages/ribodetector/detect_cpu.py", line 746, in main
    seq_pred.detect()
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/site-packages/ribodetector/detect_cpu.py", line 526, in detect
    self.run_with_chunks()
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/site-packages/ribodetector/detect_cpu.py", line 354, in run_with_chunks
    p.start()
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/opt/anaconda3/envs/ribodetector/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'onnxruntime.capi.onnxruntime_pybind11_state.InferenceSession' object

Are there any additional dependancies for running on MacOS vs. Linux? Any tips would be appreciated!

Is your CPU Apple Silicon?

omicz commented

I have an Intel CPU, sorry for not including hardware specs in my initial post! Hardware includes a 2.5 GHz 14-core Intel Xeon W processor, 32 GB RAM, and a Radeon Pro Vega 56 8 GB GPU.

Thank you so much for reporting this issue. I am able to reproduce this on a MacBook pro with Intel CPU. At the first glance, it is a compatibility issue of ONNXruntime with Python multiprocessing on MacOS. I will take a closer look and hopefully fix it in the next release. But I would recommend running RiboDetector on a powerful Linux server. This will allow you analyze large scale datasets in a timely manner.

@dawnmy any resolution to this? I encounter the same issue and has no access to Linux server.

I will try to fix it this weekend.

@omicz @ywlim-sea Thank you for reporting this issue. The issue has been fixed in version 0.2.7! You can have a try.