Memory Allocation Error in impute.py
AnnabelPerry opened this issue · 4 comments
Hello, I am attempting to run impute.py in a conda environment with Python version 3.9.16, pandas version 1.1.4. I am encountering the following error:
2023-06-27 19:22:09,875 INFO impute - main: creating pedigree ...
2023-06-27 19:22:09,981 INFO preprocess_data - create_pedigree: loaded kinship file
2023-06-27 19:22:10,063 INFO preprocess_data - create_pedigree: loaded agesex file
2023-06-27 19:22:10,129 INFO preprocess_data - create_pedigree: creating age and sex dictionaries
2023-06-27 19:22:10,192 INFO preprocess_data - create_pedigree: dictionaries created
2023-06-27 19:22:10,192 INFO preprocess_data - create_pedigree: creating pedigree objects
2023-06-27 19:22:10,261 INFO impute - main: pedigree loaded.
2023-06-27 19:22:10,265 INFO impute - run_imputation: processing /n/groups/reich/anp9168/VCFs/chr1,None
2023-06-27 19:22:10,265 INFO preprocess_data - prepare_data: For file /n/groups/reich/anp9168/VCFs/chr1;None: Finding which chromosomes
2023-06-27 19:22:27,153 INFO preprocess_data - prepare_data: with chromosomes [1] initializing non_gts data
2023-06-27 19:22:27,154 INFO preprocess_data - prepare_data: with chromosomes [1] loading and filtering pedigree file ...
2023-06-27 19:22:27,985 INFO preprocess_data - prepare_data: Adding control to the pedigree ...
2023-06-27 19:22:28,008 INFO preprocess_data - prepare_data: Control Added.
2023-06-27 19:22:28,363 INFO preprocess_data - prepare_data: with chromosomes [1] loading bim file ...
2023-06-27 19:22:28,363 INFO preprocess_data - prepare_data: with chromosomes [1] loading and transforming ibd file ...
2023-06-27 19:22:31,564 INFO preprocess_data - prepare_data: ibd loaded.
2023-06-27 19:22:31,564 INFO preprocess_data - prepare_data: with chromosomes ['1'] initializing non_gts data done ...
2023-06-27 19:22:31,733 INFO preprocess_data - prepare_gts: with chromosomes ['1'] initializing gts data with start=0 end=58745
Traceback (most recent call last):
File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 432, in <module>
main(args)
File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 326, in main
run_imputation(args)
File "/home/anp9168/anaconda3/envs/sniparEnv/bin/impute.py", line 208, in run_imputation
phased_gts, unphased_gts, iid_to_bed_index, pos, freqs, hdf5_output_dict = prepare_gts(phased_address, unphased_address, bim, pedigree_output, ped_ids, chromosomes, start, end, pcs, pc_ids, find_optimal_pc)
File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/snipar/imputation/preprocess_data.py", line 713, in prepare_gts
probs= bgen.read((slice(0, len(bgen.samples)),slice(start, end)))
File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/bgen_reader/_bgen2.py", line 552, in read
val = np.full(
File "/home/anp9168/anaconda3/envs/sniparEnv/lib/python3.9/site-packages/numpy/core/numeric.py", line 343, in full
a = empty(shape, dtype, order)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 853. GiB for an array with shape (487409, 58745, 4) and data type float64
Here is the code I ran:
source activate sniparEnv
unset PYTHONPATH
impute.py -c --ibd IBD_Chr@.ibd --bgen chr@ --out Imputed_Chr@ --king FirstDegreeKING_forImputation.kin0 --agesex FirstDegreeAgeSex_forImputation.txt
Initially, I gave the --ibd
flag the IBD_Chr@
prefix without the .ibd
suffix, but got the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'IBD_Chr1.segments.gz'
I checked my ibd.py
outputs and they all are named in the format IBD_Chr@.ibd.segments.gz
and IBD_Chr@.l2.ldscore.gz
, so I added the .ibd
suffix to help snipar
find the IBD_Chr@.ibd.segments.gz
files, but I worry this introduced a new error
Thank you for your quick response - I've tried running with the --batch_size
argument (and also with a single hyphen as in -batch_size
) set to 5000, but in both cases I get impute.py: error: unrecognized arguments: -batch_size 5000
That worked, thanks!