What does correctKin do?
Closed this issue · 4 comments
Hello! It looks like correctKin
is only to be used when the sample size is small. What does it do and what does "small sample size" being True or False lead to? Specifically I'm looking at this #54.
As mentioned in a previous issue, I'm running a single query against all the other samples, so I have two blocks, one with one sample and one with many. I want to make sure the correctKin
is something I can safely skip, and I want to make sure I'm setting the small sample size correctly.
Hi! That is accurate - correctKin
is only used when the option small.samp.correct = TRUE
. This small sample correction is an adjustment made after the initial kinship estimates are calculated to attempt to protect from a small number of samples with unique ancestry having too much leverage in the PC based ancestry adjustment. When small.samp.correct = TRUE
, the returned kinship estimates are after this adjustment. I've only observed this issue of samples with high leverage causing over adjustment in small samples - it's hard to put an exact number, but I would typically recommend using small.samp.correct = TRUE
with samples < 5000 (if over adjustment isn't an issue in the sample, the small sample adjustment just won't alter the estimates significantly). We recently changed small.samp.correct = TRUE
by default, because a lot of users seem to run PC-Relate on small numbers of individuals, and the small sample adjustment can be beneficial to them.
Note that if you have more than one sample block, then you can not use this adjustment. The code will automatically set the small.samp.correct
parameter to FALSE
(with a message printed to the console) when the number of sample blocks is more than 1.
Since you are running a single query against all other samples and have two blocks, you actually can not use the small.samp.correct
(which calls the correctKin
function). You can safely set this parameter to FALSE
for your particular analyses.
Got it! That makes sense, and thanks for your detailed reply as always.
Regarding correctK2
and correctK0
, are those also there to correct small sample sizes?
No, correctK2
and correctK0
should always be used if you are computing IBD probability estimates.
correctK2
does an adjustment to account for deviations from expected heterozygosity (as measured by the inbreeding coefficient) for each individual in the pair. (There's also an additional small sample size adjustment built into the function, but it won't run when small.samp.correct = FALSE
).
correctK0
is used to choose the "better" k0 estimator for each sample pair. In testing PC-Relate when it was written, we found that one estimator gave better results for 1st degree relatives, while another estimator (a function of the estimated kinship and k2 values) gave better results for more distant relatives. PC-Relate calculates the first estimator for all pairs initially (since we don't know the relatedness a priori), and correctK0
replaces values for pairs with kinship estimate < 2^(-5/2) with (1 - 4*kin + k2).
That is super helpful! Thank you!