fgvieira/ngsLD

Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec, : long vectors not supported yet: ../../../include/Rinlinedfuns.h:537

jcaccavo opened this issue · 4 comments

Hi there,

I got the following error when trying to run the fit_LDdecay.R script:
Rscript --vanilla --slave /srv/public/users/jcaccavo/11_CCGA_full_seq/02_NovaSeq/02_WG/x_scripts/ngsLD/scripts/fit_LDdecay.R --n_ind=43 --plot_scale=4 --ld_files ld_files_noDS.list --out dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD_decay1.pdf

Random seed: 41963
Warning message:
In read.table(opt$ld_files, header = opt$header, stringsAsFactors = FALSE) :
  incomplete final line found by readTableHeader on 'ld_files_noDS.list'
==> Fitting r2 LD decay assuming a one (rate of decay) parameter decay model
Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec,  :
  long vectors not supported yet: ../../../include/Rinlinedfuns.h:537
Calls: read.table -> type.convert -> type.convert.default
Execution halted

I'm using R/4.2.2.

I get this same error running the fit_LDdecay.R script on 2 other ngsLD outputs. Interestingly, for 1 ngsLD output, I do not get an error and am able to create the decay plot without any issues.

I wonder if it is a file size issue? The file sizes for the ngsLD outputs are as follows:

724G	dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd11_NR_depth_DS10X_LD
374G	dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd18_NR_depth_DS5X_LD
52G*	dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd29_NR_depth_DS2X_LD
333G	dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD

The file size with the asterisk (52G) is the only one that works.

I'm pasting below the heads of the 4 LD files (the 3 that don't work, and the 1 that works), and the 4 input .list files (simple text files with the name of the input file for the script) can be downloaded from my dropbox.

If you have any advice as to how I might be able to generate decay plots for these 3 ngsLD outputs, or if you require any further information, please let me know.

Thanks,
Jilda

# the DS10X_LD dataset does not work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd11_NR_depth_DS10X_LD <==
HiC_scaffold_10:52562	HiC_scaffold_10:52709	147	0.994830	0.062234	1.000000	0.999999
HiC_scaffold_10:52422	HiC_scaffold_10:52429	7	0.999931	0.062214	1.000000	1.000000
HiC_scaffold_10:52430	HiC_scaffold_10:52562	132	0.998754	0.062226	1.000000	1.000000
HiC_scaffold_10:52429	HiC_scaffold_10:52430	1	0.999925	0.062214	1.000000	1.000000
HiC_scaffold_10:50950	HiC_scaffold_10:51186	236	0.987818	0.062257	1.000000	1.000000
HiC_scaffold_10:51493	HiC_scaffold_10:51494	1	0.999926	0.067308	0.999994	0.999986
HiC_scaffold_10:51186	HiC_scaffold_10:51289	103	0.015024	-0.004266	1.000000	0.004883
HiC_scaffold_10:52709	HiC_scaffold_10:53129	420	0.961784	0.062310	1.000000	0.999993
HiC_scaffold_10:53139	HiC_scaffold_10:53143	4	0.025715	-0.005257	1.000000	0.006112
HiC_scaffold_10:52202	HiC_scaffold_10:52352	150	0.998818	0.062218	1.000000	1.000000

# the DS5X_LD dataset does not work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd18_NR_depth_DS5X_LD <==
HiC_scaffold_1093:2618831	HiC_scaffold_1093:2618842	11	0.139283	-0.064876	0.999946	0.137370
HiC_scaffold_10:52107	HiC_scaffold_10:52108	1	0.419782	0.058858	0.864473	0.566208
HiC_scaffold_10:52534	HiC_scaffold_10:52536	2	0.996941	0.070424	0.999987	0.999971
HiC_scaffold_10:52530	HiC_scaffold_10:52531	1	0.990613	0.070602	0.999993	0.999982
HiC_scaffold_10:52519	HiC_scaffold_10:52521	2	0.986029	0.089175	0.999999	0.999986
HiC_scaffold_10:52530	HiC_scaffold_10:52534	4	0.915120	0.070450	0.999997	0.999983
HiC_scaffold_10:52283	HiC_scaffold_10:52291	8	0.980522	0.071732	0.999997	0.999991
HiC_scaffold_10:54168	HiC_scaffold_10:54170	2	0.390106	0.071936	0.998830	0.996375
HiC_scaffold_10:52530	HiC_scaffold_10:52536	6	0.918061	0.070397	0.999998	0.999984
HiC_scaffold_10:52291	HiC_scaffold_10:52292	1	0.877743	0.074759	1.000000	0.999993

# the DS2X_LD dataset DOES work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd29_NR_depth_DS2X_LD <==
HiC_scaffold_10:79542	HiC_scaffold_10:79543	1	0.793716	0.071503	1.000000	0.999345
HiC_scaffold_10:78110	HiC_scaffold_10:78111	1	0.996641	0.044456	0.999972	0.999935
HiC_scaffold_10:79542	HiC_scaffold_10:79544	2	0.999996	0.056655	0.999909	0.999817
HiC_scaffold_10:78112	HiC_scaffold_10:78113	1	0.924173	0.043201	0.999983	0.999952
HiC_scaffold_10:79542	HiC_scaffold_10:79545	3	0.936968	0.056903	0.999909	0.999794
HiC_scaffold_10:78113	HiC_scaffold_10:78119	6	0.877176	0.043487	0.999982	0.999943
HiC_scaffold_10:79542	HiC_scaffold_10:79548	6	0.988844	0.056420	0.999907	0.999814
HiC_scaffold_10:78108	HiC_scaffold_10:78109	1	0.913702	0.044911	0.999965	0.999909
HiC_scaffold_10:78111	HiC_scaffold_10:78112	1	0.999846	0.044405	0.999973	0.999945
HiC_scaffold_10:78102	HiC_scaffold_10:78108	6	0.877358	0.044839	0.999957	0.999900

# the LD dataset does not work
==> dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD <==
HiC_scaffold_10:60766	HiC_scaffold_10:60767	1	0.995350	0.044978	0.999946	0.999871
HiC_scaffold_10:54182	HiC_scaffold_10:54183	1	0.975895	0.068707	0.999949	0.999897
HiC_scaffold_10:60759	HiC_scaffold_10:60765	6	0.994958	0.046003	0.999965	0.999926
HiC_scaffold_10:53894	HiC_scaffold_10:53896	2	0.608587	0.043776	0.774698	0.577412
HiC_scaffold_10:51123	HiC_scaffold_10:51124	1	0.999997	0.048738	0.999996	0.999992
HiC_scaffold_10:59207	HiC_scaffold_10:60759	1552	0.000472	0.001614	0.036713	0.000302
HiC_scaffold_10:53894	HiC_scaffold_10:54168	274	0.011254	-0.004288	0.999751	0.004911
HiC_scaffold_10:60759	HiC_scaffold_10:60766	7	0.938774	0.045220	0.999980	0.999940
HiC_scaffold_10:51626	HiC_scaffold_10:52036	410	0.085267	0.136073	0.673758	0.330037
HiC_scaffold_10:56660	HiC_scaffold_10:59207	2547	0.000216	-0.000725	0.025851	0.000027

What is the input of the file ld_files_noDS.list?
R gives a warning when reading it:

Random seed: 41963
Warning message:
In read.table(opt$ld_files, header = opt$header, stringsAsFactors = FALSE) :
  incomplete final line found by readTableHeader on 'ld_files_noDS.list'

Thanks for your response!

The ld_files_noDS.list is classified per file as ASCII text, with no line terminators, as are all of my .list input files to the fit_LDdecay.R script. I have no problem plotting the LD decay with this R script for the ld_files_2X.list file, but the other 3 (ld_files_10X.list, ld_files_5X.list, ld_files_noDS.list) all result in the error indicated above.

These .list files provide the filename for the input file to the R script. This input file name indicated in the .list files is the output of ngsLD. For the input file name indicated in ld_files_noDS.list (dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD), this file represents the output from ngsLD. This output was achieved by running ngsLD with the following code:

/srv/public/users/jcaccavo/11_CCGA_full_seq/02_NovaSeq/02_WG/x_scripts/ngsLD/ngsLD --geno /srv/public/users/jcaccavo/11_CCGA_full_seq/02_NovaSeq/02_WG/15_angsd/dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth.beagle.gz --probs --n_ind 43 --n_sites 5044175 --n_threads 40 --max_kb_dist 100 --min_maf 0.05 --seed 1 --posH dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_SNPs_pos.txt --out dmawsoni2021_GL1_doMaf3_doMajorMinor1_doGlf2_minMaf0.05_SNPpval1e6_minInd32_NR_depth_LD

Thank you again for your help, and please let me know if I addressed your question, or if there is more/different information that I can provide.

Can you send me a small example file so I can try to reproduce the error?

Apologies for the delayed response.

Given that I fear the issue may be related to the size of my input LD files, I wonder if it would not be best to work with the original files, if possible. Of the 3 datasets that aren't working with the fit_LDdecay.R script, I've zipped the input LD file, which nonetheless remains at 84 GB, and you can download it from here.

Here is the list file I am using as input to the fit_LDdecay.R script, which simply identified the file location of the input LD file indicated above.

If it's not possible to download/work with these files, if you could suggest an alternative way forward, that would be great.

Thanks so much for your help!