velten-group/crispat

Multiprocessing freeze_support error

Opened this issue · 4 comments

Dear team,
Thank you for the tool. Upon using the negative binomial approach, I am error with the multiprocessing freeze

image

Hi Saranya,
nice to hear that you are using crispat. Since I haven't seen this error before, I am not sure what's going wrong here and would need some more details to help you troubleshoot. Have you tried running the negative binomial assignment function in our tutorial Jupyter notebook (https://github.com/velten-group/crispat/blob/main/tutorials/guide_assignment.ipynb) and does it work there? Also, do you get the same error when running the binomial and poisson assignment? Additionally, I would recommend when using the parallelisation option on a cluster to manually specify the "n_processes" parameter to be sure that it matches the number of available/requested cores. An example python script that we used for running the negative binomial assignment can be found in our crispat_analysis repository (https://github.com/velten-group/crispat_analysis/blob/main/python/guide_assignment/negative_binomial.py)

Hello Jana,
I do not get error with UMI, gauss and poisson_gauss, , the execution stops at this point but continues further, but with negative binomial and poisson it is stuck at this for more than 20 hrs now.
Note: I am using these methods as my data is high moi.

Hi Saranya,

that makes sense since UMI, gauss and poisson_gauss are not using parallelisation (via dask cluster). How large is your data set (how many cells and gRNAs?)? And how exactly are you calling the 'ga_negative_binomial' function? Have you tried running our tutorial Jupyter notebook (https://github.com/velten-group/crispat/blob/main/tutorials/guide_assignment.ipynb) or call the function as shown in our crispat analysis repository (https://github.com/velten-group/crispat_analysis/blob/main/python/guide_assignment/negative_binomial.py) and does the error persist there?

If your data set is not too big, you can also set the parameter 'parallelize = False' to disable the parallelisation that seems to create an issue for you. If it is larger though and would take too long without parallelisation, there is another alternative to do parallelisation manually (set 'parallelize = False'). If you are working on a high-performance cluster, you can split the assignment task into multiple jobs by using the 'start_gRNA' and 'gRNA_step' parameters of the assignment function. These two parameters allow you to run the assignment on a subset of gRNAs (it takes 'gRNA_step' many gRNAs starting from index 'start_gRNA'). So, by submitting multiple jobs with different 'start_gRNA' settings and combining the resulting data frames in the end (e.g. using our 'combine_assignments' function) you can also obtain the full assignment faster.

Hello Jana,
I used the /guide_assignment.ipynb) previously and landed on this error. I now used negative_binomial.py and I get the error below:

Traceback (most recent call last):
File "/data/humangen_mouse/test_area/Saranya/Crispr_scripts/assign_guide_with_crispat.py", line 47, in
crispat.ga_negative_binomial("gRNA_counts.h5ad",
File "/work/balachandran/.omics/anaconda3/envs/crispat/lib/python3.10/site-packages/crispat/neg_binomial.py", line 274, in ga_negative_binomial
adata_crispr = sc.read_h5ad(input_file)
File "/work/balachandran/.omics/anaconda3/envs/crispat/lib/python3.10/site-packages/anndata/_io/h5ad.py", line 237, in read_h5ad
with h5py.File(filename, "r") as f:
File "/work/balachandran/.omics/anaconda3/envs/crispat/lib/python3.10/site-packages/h5py/_hl/files.py", line 562, in init
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
File "/work/balachandran/.omics/anaconda3/envs/crispat/lib/python3.10/site-packages/h5py/_hl/files.py", line 235, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 102, in h5py.h5f.open
BlockingIOError: [Errno 11] Unable to synchronously open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')