Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec, : long vectors not supported yet: ../../../include/Rinlinedfuns.h:537
jcaccavo opened this issue · 4 comments
Hi there,
I got the following error when trying to run the fit_LDdecay.R script:
Rscript --vanilla --slave /srv/public/users/jcaccavo/11_CCGA_full_seq/02_NovaSeq/02_WG/x_scripts/ngsLD/scripts/fit_LDdecay.R --n_ind=43 --plot_scale=4 --ld_files ld_files_noDS.list --out dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD_decay1.pdf
Random seed: 41963
Warning message:
In read.table(opt$ld_files, header = opt$header, stringsAsFactors = FALSE) :
incomplete final line found by readTableHeader on 'ld_files_noDS.list'
==> Fitting r2 LD decay assuming a one (rate of decay) parameter decay model
Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec, :
long vectors not supported yet: ../../../include/Rinlinedfuns.h:537
Calls: read.table -> type.convert -> type.convert.default
Execution halted
I'm using R/4.2.2.
I get this same error running the fit_LDdecay.R script on 2 other ngsLD outputs. Interestingly, for 1 ngsLD output, I do not get an error and am able to create the decay plot without any issues.
I wonder if it is a file size issue? The file sizes for the ngsLD outputs are as follows:
724G dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd11_NR_depth_DS10X_LD
374G dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd18_NR_depth_DS5X_LD
52G* dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd29_NR_depth_DS2X_LD
333G dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD
The file size with the asterisk (52G) is the only one that works.
I'm pasting below the heads of the 4 LD files (the 3 that don't work, and the 1 that works), and the 4 input .list files (simple text files with the name of the input file for the script) can be downloaded from my dropbox.
If you have any advice as to how I might be able to generate decay plots for these 3 ngsLD outputs, or if you require any further information, please let me know.
Thanks,
Jilda
# the DS10X_LD dataset does not work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd11_NR_depth_DS10X_LD <==
HiC_scaffold_10:52562 HiC_scaffold_10:52709 147 0.994830 0.062234 1.000000 0.999999
HiC_scaffold_10:52422 HiC_scaffold_10:52429 7 0.999931 0.062214 1.000000 1.000000
HiC_scaffold_10:52430 HiC_scaffold_10:52562 132 0.998754 0.062226 1.000000 1.000000
HiC_scaffold_10:52429 HiC_scaffold_10:52430 1 0.999925 0.062214 1.000000 1.000000
HiC_scaffold_10:50950 HiC_scaffold_10:51186 236 0.987818 0.062257 1.000000 1.000000
HiC_scaffold_10:51493 HiC_scaffold_10:51494 1 0.999926 0.067308 0.999994 0.999986
HiC_scaffold_10:51186 HiC_scaffold_10:51289 103 0.015024 -0.004266 1.000000 0.004883
HiC_scaffold_10:52709 HiC_scaffold_10:53129 420 0.961784 0.062310 1.000000 0.999993
HiC_scaffold_10:53139 HiC_scaffold_10:53143 4 0.025715 -0.005257 1.000000 0.006112
HiC_scaffold_10:52202 HiC_scaffold_10:52352 150 0.998818 0.062218 1.000000 1.000000
# the DS5X_LD dataset does not work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd18_NR_depth_DS5X_LD <==
HiC_scaffold_1093:2618831 HiC_scaffold_1093:2618842 11 0.139283 -0.064876 0.999946 0.137370
HiC_scaffold_10:52107 HiC_scaffold_10:52108 1 0.419782 0.058858 0.864473 0.566208
HiC_scaffold_10:52534 HiC_scaffold_10:52536 2 0.996941 0.070424 0.999987 0.999971
HiC_scaffold_10:52530 HiC_scaffold_10:52531 1 0.990613 0.070602 0.999993 0.999982
HiC_scaffold_10:52519 HiC_scaffold_10:52521 2 0.986029 0.089175 0.999999 0.999986
HiC_scaffold_10:52530 HiC_scaffold_10:52534 4 0.915120 0.070450 0.999997 0.999983
HiC_scaffold_10:52283 HiC_scaffold_10:52291 8 0.980522 0.071732 0.999997 0.999991
HiC_scaffold_10:54168 HiC_scaffold_10:54170 2 0.390106 0.071936 0.998830 0.996375
HiC_scaffold_10:52530 HiC_scaffold_10:52536 6 0.918061 0.070397 0.999998 0.999984
HiC_scaffold_10:52291 HiC_scaffold_10:52292 1 0.877743 0.074759 1.000000 0.999993
# the DS2X_LD dataset DOES work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd29_NR_depth_DS2X_LD <==
HiC_scaffold_10:79542 HiC_scaffold_10:79543 1 0.793716 0.071503 1.000000 0.999345
HiC_scaffold_10:78110 HiC_scaffold_10:78111 1 0.996641 0.044456 0.999972 0.999935
HiC_scaffold_10:79542 HiC_scaffold_10:79544 2 0.999996 0.056655 0.999909 0.999817
HiC_scaffold_10:78112 HiC_scaffold_10:78113 1 0.924173 0.043201 0.999983 0.999952
HiC_scaffold_10:79542 HiC_scaffold_10:79545 3 0.936968 0.056903 0.999909 0.999794
HiC_scaffold_10:78113 HiC_scaffold_10:78119 6 0.877176 0.043487 0.999982 0.999943
HiC_scaffold_10:79542 HiC_scaffold_10:79548 6 0.988844 0.056420 0.999907 0.999814
HiC_scaffold_10:78108 HiC_scaffold_10:78109 1 0.913702 0.044911 0.999965 0.999909
HiC_scaffold_10:78111 HiC_scaffold_10:78112 1 0.999846 0.044405 0.999973 0.999945
HiC_scaffold_10:78102 HiC_scaffold_10:78108 6 0.877358 0.044839 0.999957 0.999900
# the LD dataset does not work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD <==
HiC_scaffold_10:60766 HiC_scaffold_10:60767 1 0.995350 0.044978 0.999946 0.999871
HiC_scaffold_10:54182 HiC_scaffold_10:54183 1 0.975895 0.068707 0.999949 0.999897
HiC_scaffold_10:60759 HiC_scaffold_10:60765 6 0.994958 0.046003 0.999965 0.999926
HiC_scaffold_10:53894 HiC_scaffold_10:53896 2 0.608587 0.043776 0.774698 0.577412
HiC_scaffold_10:51123 HiC_scaffold_10:51124 1 0.999997 0.048738 0.999996 0.999992
HiC_scaffold_10:59207 HiC_scaffold_10:60759 1552 0.000472 0.001614 0.036713 0.000302
HiC_scaffold_10:53894 HiC_scaffold_10:54168 274 0.011254 -0.004288 0.999751 0.004911
HiC_scaffold_10:60759 HiC_scaffold_10:60766 7 0.938774 0.045220 0.999980 0.999940
HiC_scaffold_10:51626 HiC_scaffold_10:52036 410 0.085267 0.136073 0.673758 0.330037
HiC_scaffold_10:56660 HiC_scaffold_10:59207 2547 0.000216 -0.000725 0.025851 0.000027
What is the input of the file ld_files_noDS.list
?
R gives a warning when reading it:
Random seed: 41963 Warning message: In read.table(opt$ld_files, header = opt$header, stringsAsFactors = FALSE) : incomplete final line found by readTableHeader on 'ld_files_noDS.list'
Thanks for your response!
The ld_files_noDS.list
is classified per file
as ASCII text, with no line terminators
, as are all of my .list
input files to the fit_LDdecay.R
script. I have no problem plotting the LD decay with this R script for the ld_files_2X.list
file, but the other 3 (ld_files_10X.list
, ld_files_5X.list
, ld_files_noDS.list
) all result in the error indicated above.
These .list
files provide the filename for the input file to the R script. This input file name indicated in the .list
files is the output of ngsLD. For the input file name indicated in ld_files_noDS.list
(dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD
), this file represents the output from ngsLD. This output was achieved by running ngsLD with the following code:
/srv/public/users/jcaccavo/11_CCGA_full_seq/02_NovaSeq/02_WG/x_scripts/ngsLD/ngsLD --geno /srv/public/users/jcaccavo/11_CCGA_full_seq/02_NovaSeq/02_WG/15_angsd/dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth.beagle.gz --probs --n_ind 43 --n_sites 5044175 --n_threads 40 --max_kb_dist 100 --min_maf 0.05 --seed 1 --posH dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_SNPs_pos.txt --out dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD
Thank you again for your help, and please let me know if I addressed your question, or if there is more/different information that I can provide.
Can you send me a small example file so I can try to reproduce the error?
Apologies for the delayed response.
Given that I fear the issue may be related to the size of my input LD files, I wonder if it would not be best to work with the original files, if possible. Of the 3 datasets that aren't working with the fit_LDdecay.R script, I've zipped the input LD file, which nonetheless remains at 84 GB, and you can download it from here.
Here is the list file I am using as input to the fit_LDdecay.R script, which simply identified the file location of the input LD file indicated above.
If it's not possible to download/work with these files, if you could suggest an alternative way forward, that would be great.
Thanks so much for your help!