Detection of outliers before implementing binomial test for continuous response variable
akhileshtayade opened this issue · 2 comments
Hello Florian!
I have been trying to understand how outliers are detected when a continuous response variable is under consideration. I am having a tough time to understand the following line of code from testOutliers
which, I think, is specifically written for the detection of outliers in a continuous response variable:
Line 195 in ea2f76e
(For now, it would be great if we could consider margin to be only lower
since it would help me immensely to understand the rationale behind the current implementation.)
From the discussion at #182, I infer that if we simulate a model for nSim
times, the probability of an observed value, nSim + 1
values is
However, I can not understand why the value of a DHARMa residual is being compared to the probability of residual being the minimum value from the IID sample of nSim + 1
values? Since an outlier is an observation with DHARMa residual equal to 0 (or 1), then can't we directly use outliers = sum(simulationOutput$scaledResiduals == 0)
for lower
margin? Are there any special cases where the proposed method would fail?
May be I am thinking too much about this and it could be that the above implementation is written the way it is written because that is an appropriate way to detect outliers in DHARMa versions before 0.3.1
where the DHARMA.ecdf
is used to calculate residuals using the traditional
method and it works for standard eCDF as well?
The current implementation works without any problem as any observation with DHARMa residual of 0 will lead to evaluation of simulationOutput$scaledResiduals < (1/(simulationOutput$nSim+1))
to be TRUE
and considered as outlier. But I was still curious to know if there are any other reasons behind its implementation that I did not mention above.
Thank you!
Hello Akhilesh,
thanks for the question! I have to admit that I'm also a bit puzzled as to why this was programmed as it is. Your conjecture that this was introduced because of the old residual definition where outliers were distributed evenly seems plausible to me though.
I will leave this ticket open to give a more thorough check later.
Best,
Florian
Thank you for your response, Florian!
Sincerely,
Akhilesh