what is the format of the LDfile used in ?

Question

what is the format of the LDfile used in ?

complexgenome opened this issue 4 years ago · 4 comments

I installed lassosum library on Rv4.0, ~/libraries_R/R_LIB4.0/lassosum/data where I've following files:


list.files()
 [1] "Berisa.AFR.hg19.bed"  "Berisa.AFR.hg38.bed"  "Berisa.ASN.hg19.bed"
 [4] "Berisa.ASN.hg38.bed"  "Berisa.EUR.hg19.bed"  "Berisa.EUR.hg38.bed"
 [7] "Berisa.R"             "Berisa.README"        "GenerateData.R"
[10] "refpanel.bed"         "refpanel.bim"         "refpanel.fam"
[13] "summarystats.txt"     "testsample.bed"       "testsample.bim"
[16] "testsample.covar.txt" "testsample.fam"       "testsample.pheno.txt"

I've data for an admixed population for which none of the widely used population or LD panel will be helpful.
What is the format of LD used in pipeline with the lassosum.pipeline function?

### Read LD region file ###
LDblocks <- "EUR.hg19" # This will use LD regions as defined in Berisa and Pickrell (2015) for the European population and the hg19 genome.
# Other alternatives available. Type ?lassosum.pipeline for more details.

I think from the ?lassosum.pipeline LD block is a data.frame of the following format:

chr	start	stop
chr1	1961168	3666172

I'm at the repo https://bitbucket.org/nygcresearch/ldetect/src/master/
Can this be used to create an in-house LD acceptable data format for the lassosum?
I got link to this code repository from https://academic.oup.com/bioinformatics/article/32/2/283/1743626

Or, can output from plink be used? https://zzz.bwh.harvard.edu/plink/ld.shtml
plink --file mydata --r

thanks,

Answer 1 · 2020-09-14T04:25:27.000Z

It's the BED format, as detailed here. You can have a look at the file "Berisa.EUR.hg19.bed" for reference.

Answer 2 · 2020-09-14T13:33:44.000Z

thank you. I've an admixed population and I'd like create the bed regions for the population for which I'm interested to use lassosum. I was unable to find any guidelines to work with an in-house population data specifically LDblocks

out <- lassosum.pipeline(cor=cor, chr=ss$Chr, pos=ss$Position, A1=ss$A1, A2=ss$A2, ref.bfile=ref.bfile, test.bfile=test.bfile, LDblocks = LDblocks).

Answer 3 · 2020-09-15T03:26:28.000Z

Deriving an appropriate set of LD blocks is beyond the scope of lassosum. You could refer to the Berisa and Pickrell paper or their codes if you want to run their algorithm on your data. Or you can try some other software. My experience with Berisa and Pickrell was that it was not easy to run.

However, if PGS is your aim, my simulations suggested that these LD block boundaries don't actually matter all that much. Just use, say, the EUR LD blocks, and I think you'll get results that are close to optimum.

Answer 4 · 2020-09-15T12:58:42.000Z

Thanks for your reply. Based on my limited understanding and I could be wrong LD patterns and blocks are population specific. In the interested population they have smaller LD blocks and stronger SNP correlation than the EUR. It's a three-way admixed population.