Invalid entry in input when using prune_graph.pl
James-S-Santangelo opened this issue · 3 comments
Hello!
I'm attempting to use the prune_graph.pl
script to prune tightly linked SNPs but am having a bit of trouble getting it to run. I generated pairwise LD estimates using ngsLD
using the following command:
( NUM_SITES=$(cat {input.pos} | wc -l) &&
ngsLD --geno {input.gls} \
--pos {input.pos} \
--n_ind 120 \
--n_sites $NUM_SITES \
--probs \
--n_threads {threads} \
--max_kb_dist 25 | gzip --best > {output} ) 2> {log}
The command completed successfully and generated an output file that looks like:
CM019101.1:17081 CM019101.1:17090 9 0.918051 0.012626 0.999931 0.999790
CM019101.1:16935 CM019101.1:16936 1 0.995387 0.005696 0.999970 0.999938
CM019101.1:17066 CM019101.1:17069 3 0.953309 0.012726 0.999920 0.999804
CM019101.1:17021 CM019101.1:17030 9 0.999206 0.004410 0.999969 0.999937
CM019101.1:16936 CM019101.1:16995 59 0.189878 -0.000047 0.999991 0.000047
CM019101.1:17063 CM019101.1:17066 3 0.058000 0.007546 0.999666 0.024693
CM019101.1:16935 CM019101.1:16995 60 0.183163 -0.000047 0.999992 0.000047
CM019101.1:16995 CM019101.1:17012 17 0.119669 -0.000512 0.999998 0.000547
CM019101.1:17066 CM019101.1:17078 12 0.819614 0.012485 0.999938 0.999816
CM019101.1:17090 CM019101.1:17114 24 0.611619 0.012602 0.999971 0.999765
Using the supplementary material from the ngsLD
paper as a template, I'm running the following command to perform the pruning using the output from ngsLD
as input:
zcat {input} | cut -f 1,3,5- | perl /opt/bin/prune_graph.pl \
--max_kb_dist 25 \
--min_weight 0.5 | sort -V > {output}
However, I'm receiving the following error:
### Reading data from -
ERROR: invalid entry in input file - line 1:
The error persists even if I don't read the file from STDIN but instead pass the contents of zcat {input} | cut -f 1,3,5-
to the --in_file
argument to prune_graph.pl
For reference, I am running these commands inside of a singularity container (see HERE) where ngsLD
has been installed.
Any idea what might be going on?
Thanks!
The issue was that cut -f 1,3,5-
is not needed; the input file can be piped to prune_graph.pl
as-is and the script will use the 7th column of the input file (i.e., EM estimation of r2) as its measure of LD (by default).
Including cut -f 1,3,5-
resulted in fewer than 7 columns in the input file, hence the error above. I opted to remove cut -f 1,3,5-
, though prune_graph.pl
also allows the column with LD weights to be specified manually by passing the --field-weight
argument.
Feel free to close this issue.
Hi James,
thanks for reporting back that you found the solution.
Where did you see that cut
command example?
Hey sorry I missed this. I saw the cut
command on page 16 of the manuscript's supplement.