fgvieira/ngsLD

Invalid entry in input when using prune_graph.pl

James-S-Santangelo opened this issue · 3 comments

Hello!

I'm attempting to use the prune_graph.pl script to prune tightly linked SNPs but am having a bit of trouble getting it to run. I generated pairwise LD estimates using ngsLD using the following command:

( NUM_SITES=$(cat {input.pos} | wc -l) &&                                                                                                                                                                                                                         
  ngsLD --geno {input.gls} \
        --pos {input.pos} \
        --n_ind 120 \
        --n_sites $NUM_SITES \
        --probs \
        --n_threads {threads} \
        --max_kb_dist 25 | gzip --best > {output} ) 2> {log}

The command completed successfully and generated an output file that looks like:

CM019101.1:17081	CM019101.1:17090	9	0.918051	0.012626	0.999931	0.999790
CM019101.1:16935	CM019101.1:16936	1	0.995387	0.005696	0.999970	0.999938
CM019101.1:17066	CM019101.1:17069	3	0.953309	0.012726	0.999920	0.999804
CM019101.1:17021	CM019101.1:17030	9	0.999206	0.004410	0.999969	0.999937
CM019101.1:16936	CM019101.1:16995	59	0.189878	-0.000047	0.999991	0.000047
CM019101.1:17063	CM019101.1:17066	3	0.058000	0.007546	0.999666	0.024693
CM019101.1:16935	CM019101.1:16995	60	0.183163	-0.000047	0.999992	0.000047
CM019101.1:16995	CM019101.1:17012	17	0.119669	-0.000512	0.999998	0.000547
CM019101.1:17066	CM019101.1:17078	12	0.819614	0.012485	0.999938	0.999816
CM019101.1:17090	CM019101.1:17114	24	0.611619	0.012602	0.999971	0.999765

Using the supplementary material from the ngsLD paper as a template, I'm running the following command to perform the pruning using the output from ngsLD as input:

zcat {input} | cut -f 1,3,5- | perl /opt/bin/prune_graph.pl \
        --max_kb_dist 25 \
        --min_weight 0.5 | sort -V > {output}

However, I'm receiving the following error:

### Reading data from -
ERROR: invalid entry in input file - line 1:

The error persists even if I don't read the file from STDIN but instead pass the contents of zcat {input} | cut -f 1,3,5- to the --in_file argument to prune_graph.pl

For reference, I am running these commands inside of a singularity container (see HERE) where ngsLD has been installed.

Any idea what might be going on?

Thanks!

The issue was that cut -f 1,3,5- is not needed; the input file can be piped to prune_graph.pl as-is and the script will use the 7th column of the input file (i.e., EM estimation of r2) as its measure of LD (by default).

Including cut -f 1,3,5- resulted in fewer than 7 columns in the input file, hence the error above. I opted to remove cut -f 1,3,5-, though prune_graph.pl also allows the column with LD weights to be specified manually by passing the --field-weight argument.

Feel free to close this issue.

Hi James,

thanks for reporting back that you found the solution.

Where did you see that cut command example?

Hey sorry I missed this. I saw the cut command on page 16 of the manuscript's supplement.