Using SLALOM With Custom LD Panel
Closed this issue · 3 comments
Hello!
Based on the documentation of this software I am aware that it is possible to run this QC pipeline with a user defined LD reference. Looking through the code, it looks like this argument requires that it be saved as a Hail BlockMatrix object. However, looking further into the code, it looks as if, in addition to this, the software requires additional custom-ld-variant-index-path
and custom-ld-label
parameters. The nature of these are a bit more vague.
I have a few questions regarding all of this
- I would like to build a sparse BlockMatrix object from our LD panel, is this software compatible with both sparse and dense BlockMatrix objects?
- What would be a good way of building a sparse block matrix from a key-value R matrix pairing? I'd rather not build a dense matrix and sparsify it due to memory restrictions.
- what is the nature of the
custom-ld-variant-index-path
andcustom-ld-label
parameters? How would one go about building them given an LD BlockMatrix?
Cheers,
Tosin
Hi Tosin,
Thanks so much for your inquiry. In terms of the sparsity of Hail's BlockMatrix, please refer to the documentation here.
Briefly,
- Yes, our Hail BlockMatrix is indeed sparse via
BlockMatrix.sparsify_row_intervals
(window around variants) andBlockMatrix.sparsify_triangle
(only upper triangular matrix is kept). - Please note that Hail BlockMatrix is Hail-specific format and has limited interface with existing data formats. Although there are
BlockMatrix.from_numpy
(in-memory numpy object) andBlockMatrix.fromfile
(a binary file), I'd recommend recomputing a LD matrix in Hail. If you have a vcf file, you canhl.import_vcf
to make Hail MatrixTable, convert it to Hail BlockMatrix, and do linear algebra to compute LD. - Apologies for the limited documentation for these parameters -- they are originally intended for internal use.
custom-ld-variant-index-path
represents a path to a Hail Table that records indices of variants in Hail BlockMatrix (required fields are shown below).custom-ld-label
is just an output label for the output. For example, if you specifycustom
, the output contains a columncustom_lead_r[2]
----------------------------------------
Global fields:
None
----------------------------------------
Row fields:
'locus': locus<GRCh38>
'alleles': array<str>
'idx': int64
----------------------------------------
Key: ['locus', 'alleles']
----------------------------------------
Hope this helps!
Best,
Masa
Yes, thank you! This has been pretty helpful. I'll let you know if I have any more questions
I was able to run the software successfully with some modifications. This issue can be closed now. Thank you!