fgvieira/ngsLD

ngsLD usage

elizeng opened this issue · 5 comments

Hi,

I am new to analyses using GL and am interested in pruning my dataset for LD.
If I understand correctly, this will involve me using perl ./scripts/prune_graph.pl to generate the list of unlinked sites. And then use another script to subset the dataset to only include unlinked sites?

Is there a tool that you would recommended or suggested to use for this sub-setting?

Thank you

Hi @elizeng

yes, that would be the way.... as for the subset, I'd say the easiest would be to re-run angsd with that list of sites (link).

best,

Thank you for the advice.
Is there a way to increase the speed at which the script is ran at?
I am working with quite a huge dataset of ~11million loci, and. I have been running it for at least 12 hours, but its still not completed..
The test dataset I have with ~5k loci ran relatively quickly.

Not really... that script was initially made as a proof-of concept rather then for production.
For that reason, it was made in perl and not a lot of attention was paid to optimization and such. One thing you can do, is to split the ngsLD output by contig/scaffold/chr and run the script on each one of them separately.

HI,
To re- run angsd with list of sites (unlinked loci identified with ngsLD) I have used command:
./angsd -beagle input.BEAGLE.PL.gz -sites pruned.sites -nThreads 10 -out outputLDpruning.BEAGLE.PL.gz
I am getting error
-> Inputtype is beagle
-> Must supply -fai file

I did indexing of the site file
./angsd sites index pruned.sites
that produced bin and idx files. Then I have re-run angsd
./angsd -beagle input.BEAGLE.PL.gz -sites pruned.sites.idx -nThreads 10 -out outputLDpruning.BEAGLE.PL.gz
I tried also
./angsd -beagle input.BEAGLE.PL.gz -sites pruned.sites.bin -nThreads 10 -out outputLDpruning.BEAGLE.PL.gz
but I am still getting the same error as at the beginning
-> Inputtype is beagle
-> Must supply -fai file
Please advise how to keep only unlinked loci generated with ngsLD in the beagle.pl file
Thank you.

That is more of an angsd question, but the easiest would be to run angsd again directly from the bam files. Just like you did the first time you generated the beagle file.