fgvieira/ngsLD

LD between chr/contig names

anne-laureferchaud opened this issue · 8 comments

I am wondering if you had any chance to implement an option to calculate LD between chr/contigs ?

Originally posted by @fgvieira in #4 (comment)

Dear all,

I am using version 1.1.0 (Aug 20 2019 @ 16:26:44), and if I supply all the linkage groups together (maintaining the correct labels), the LD between sites on different LGs is included in the output (I've set --max_kb_dist 0).

As you speak about implementing an option to calculate LD between LGs, shall I not trust the output produced so far?

Thanks,
Marta

Hi @anne-laureferchaud, no I have not implemented that option but, if people think it is useful, I can implement it. Just not sure how to represent distance between SNPs if they are in different chromosomes... maybe NA?

@martabe, you are saying that you provide all linkage groups as chrs/contigs and calculates LD between all of them? can you send me an example?

thanks,

Hi @fgvieira ,

Yes, I've obtained LD between markers on different chromosomes.
Some context:
The intent of my analysis was to obtain the level of inter-chromosomal LD, to which I could compare the intra-chromosomal LD levels. It seems like my population has many regions of the genome with high LD and a generally slow LD decay, so I was wondering how the LD values were between physically unlinked loci.
I selected a set of 450 variants separated by at least 2Mb, and after running ngsLD I extracted only the variant pairs belonging to different chromosomes.

ngsLD --geno 450snps_2Mbaway.beagle.gz  --probs --pos 450snps_2Mbaway.beagle.pos  --n_ind 70 --n_sites 450 --min_maf 0 --max_kb_dist 0 --ignore_miss_data --out ild_450snps_2Mbaway --n_threads 1

Here are the files (pos and output are zipped to have github upload them):
450snps_2Mbaway.beagle.gz
450snps_2Mbaway.beagle.pos.gz
ild_450snps_2Mbaway.gz

Marta

Hi @martabe, thanks for sending the data.

I ran your test and indeed, when --max_kb_dist is set to 0, ngsLD performs comparisons between all pairs of SNPs. This is because, when sites are on different chromosomes, the distance between them comes out as inf and are removed under normal circumstances.

As far as I understood, the current setup would work for @anne-laureferchaud, no? You just need to set --max_kb_dist 0.

If that is the case, I'll just include a better description in the README file.

cheers,

HI @fgvieira and @martabe,

That is correct, I am also able to perform comparisons between all pairs of SNPs, including between contains/chr and got inf for their distance.
I indeed used --max_kb_dist 0.

Thanks to both of you !
Anne-Laure

Hi all,

@anne-laureferchaud glad it worked :)

@fgvieira Can you think of any problem in calculating LD with your software when the distance is infinite? I am no expert of such calculations, and I would like to be sure that I am not missing something obvious...

Thank you!
Marta

@martabe ngsLD does not use the distance between sites for anything, other than filtering (exclude sites too far away to speed up the calculations). It just reports it, in case the user wants to use it for downstream analyses (e.g. estimate LD decay)

Thanks a lot!

Marta