How to use ggpredict to get the corresponding standard errors of the 25th and 75th percentile predicted values

Question

How to use ggpredict to get the corresponding standard errors of the 25th and 75th percentile predicted values

chenchen-HSPH opened this issue 4 months ago · 15 comments

I am using the ggpredict function to get the predicted values of height with x, predicted, std.error, conf.low, and conf.high (shown in the screenshot below)

pred_height<-as.data.frame(ggpredict(modfit, terms="height", typical="mean", back.transform=FALSE))

I used quantile(pred_height$predicted) to obtain the 75th percentile and the 25th percentile predicted values. However, I can get the corresponding standard errors of the 25th and the 75th percentile predicted by using quantile(std.error). I'd like to know how to get the corresponding SEs? Many thanks!

Answer 1 · 2024-07-08T12:04:37.000Z

If you want predictions for a specific range, you can specify shortcuts, functions, values etc. inside the terms argument. See this vignette.

In your specific case, you could use the percentile tag:

pred_height<-as.data.frame(ggpredict(
  modfit,
  terms = "height [percentile75]",
  typical="mean",
  back_transform = FALSE
))

See all options for shortcuts in ?values_at, or this vignette.

Answer 2 · 2024-07-16T19:18:05.000Z

Hi Daniel,

Thank you for your advice. It could help me a lot. I can get the results if I use "height[quart]" or "height[fivenum]"However, I always get an error message (shown below), if I use percentile as the code "height[percentile75]" shown above.

Answer 3 · 2024-07-16T20:04:23.000Z

Do you have a reproducible example? For me, it works fine:

library(ggeffects)
data(iris)
m <- lm(Sepal.Width ~ Sepal.Length, data = iris)

predict_response(m, "Sepal.Length [quart]")
#> # Predicted values of Sepal.Width
#> 
#> Sepal.Length | Predicted |     95% CI
#> -------------------------------------
#>         4.30 |      3.15 | 3.00, 3.30
#>         5.10 |      3.10 | 3.01, 3.20
#>         5.80 |      3.06 | 2.99, 3.13
#>         6.40 |      3.02 | 2.94, 3.11
#>         7.90 |      2.93 | 2.74, 3.12
predict_response(m, "Sepal.Length [percentile75]")
#> # Predicted values of Sepal.Width
#> 
#> Sepal.Length | Predicted |     95% CI
#> -------------------------------------
#>         4.90 |      3.12 | 3.01, 3.22
#>         5.10 |      3.10 | 3.01, 3.20
#>         5.40 |      3.08 | 3.01, 3.16
#>         5.60 |      3.07 | 3.00, 3.15
#>         5.70 |      3.07 | 3.00, 3.14
#>         6.00 |      3.05 | 2.98, 3.12
#>         6.30 |      3.03 | 2.95, 3.11
#>         6.80 |      3.00 | 2.89, 3.11
#> 
#> Not all rows are shown in the output. Use `print(..., n = Inf)` to show
#>   all rows.

^{Created on 2024-07-16 with reprex v2.1.1}

Answer 4 · 2024-07-16T20:17:13.000Z

Thank you for getting back to me. My data may not be as simple as the Sepal data. Now, I got the predicted values of 75th percentile first and then used this [the corresponding x value] instead of [percentile75] to the code again and it works. I got this method from the link below
https://rpubs.com/timothyfraser/visualizing_predictions_in_r

Answer 5 · 2024-07-16T22:35:57.000Z

Also, I am sorry if I did not make my original question clear. I want to get the corresponding SE of the 75th percentile of the predicted value NOT the 75th percentile of x. Based on my understanding, the [quart] or [percentile75] we taked about are related to x values NOT the predicted values?

Answer 6 · 2024-07-17T06:46:21.000Z

I'm not quite sure I understand. You predict expected values of your response variable (outcome), at meaningful values of your predictors of interest. These meaningful values can be certain levels (for factors) or values (for numeric). In the latter case, the question is at which values you would like to predict your outcome.

The predictions (expected values of your outcome) are point estimates. You cannot get a percentile of that value. It's like asking: "my predicted outcome score is 10 - what is the 75% percentile of 10?".

Answer 7 · 2024-07-17T06:47:35.000Z

I got this method from the link below https://rpubs.com/timothyfraser/visualizing_predictions_in_r

Can you copy the explicit code here? I'm not sure to which example you are referring.

Answer 8 · 2024-07-17T06:51:11.000Z

If you are referring to section 3. Using the ggpredict() function, where the 20% and 80% percentile are mentioned, than it is exactly what I have described above. In the example, the meaningful values for the predictor (not the predictions) are limited to the 20% to 80% percentile: terms = "voter_turnout [0.7455777,0.7534772,0.7613766,0.7692761,0.7771756]". This is what you get from terms = "voter_turnout [percentile80]" (or maybe "voter_turnout [percentile60]", so 20% is cut off at each end, you must try).

Answer 9 · 2024-07-17T14:26:18.000Z

I'm not quite sure I understand. You predict expected values of your response variable (outcome), at meaningful values of your predictors of interest. These meaningful values can be certain levels (for factors) or values (for numeric). In the latter case, the question is at which values you would like to predict your outcome.

The predictions (expected values of your outcome) are point estimates. You cannot get a percentile of that value. It's like asking: "my predicted outcome score is 10 - what is the 75% percentile of 10?".

Here I mean the point estimate (predicted value) corresponds to the 75th percentile. I can use quantile(ggpredict(modfit, "height")$predicted) to get the predicted value corresponding to the 75th percentile. Again, I cannot get the SE of this predicted value.

If I use ggpredict(modfit, "height[quart]"), these values are the quantiles of the x, not the predicted values?

I got an error message if I used ggpredict(modfit, "height[percentile75])...

Answer 10 · 2024-07-17T16:44:34.000Z

I'm not sure quantile(ggpredict()) makes any sense. You are making predictions for your outcome for a range of values of height. Then you just remove the lower and upper 12.5% (=25%, i.e. 75% percentile) from that results (i.e. from the predicted values). That's the same as limiting the range of height to the 75% percentile. See following example:

library(ggeffects)
data(iris)
m <- lm(Sepal.Width ~ Petal.Length, data = iris)

# we want predictions for the range 1 to 7 for Petal.Length.
p <- predict_response(m, "Petal.Length [1:7]")
p$predicted
#> [1] 3.349089 3.243304 3.137519 3.031733 2.925948 2.820163 2.714377
# 75% percentiles
quantile(p$predicted, probs = c(0.125, 0.875))
#>    12.5%    87.5% 
#> 2.793716 3.269750

# using range based on quantiles: we calculate the 75% percentile of the range 1 to 7
quantile(1:7, probs = c(0.125, 0.875))
#> 12.5% 87.5% 
#>  1.75  6.25
# we use these numbers as fixed values for Petal.Length
p <- predict_response(m, "Petal.Length [1.75, 6.25]")
# Predictions are identical
rev(p$predicted)
#> [1] 2.793716 3.269750

For the latter ("correct") approach, you get your standard errors of course.

Answer 11 · 2024-07-17T16:52:24.000Z

So you mean if I used the 75th percentile of x, I should get the 7th percentile of the predicted values too based on the example you mentioned above? So I can use ggpredict(modfit, "height[percentile75]") to get the 75th percentile of the predicted values?

If it is clear, then my issue is that I cannot use the function ggpredict(modfit, "height[percentile75]") due to some issues in my data? But I can try to use the numbers as you indicated in your example.

I will keep you updated. Many thanks!

Answer 12 · 2024-07-17T17:12:51.000Z

Have you tried updating the package? Maybe you're not using the latest package version, since the "percentile" tag was added just recently.

Answer 13 · 2024-07-31T02:13:35.000Z

Hi,
I have a similar issue - and I don't think it can be solved on the x side.
In my case, I have a parametric survival model (estimated using 'survreg' from the survival package), and what I want is the marginal predicted median survival time, not the marginal predicted mean survival time. In Stata, the margins command allows this fairly easily, with different functions for the predictions. From the Stata manual:

margins, predict(median time) gives predictions based on:
$median = \{t:\hat{S}_{j}(t) = 1/2\}$
margins, predict(mean time) gives predictions based on:
$mean = \int_{0}^{\infty} \hat{S}_{j}(t) dt$

I can also get median predictions using the 'predict.survreg' function:
predict(fit,type="quantile",p=0.5).
But that won't easily give me marginal (counterfactual) SEs.

And it doesn't look like I can pass the 'p' argument through when using ggeffects. I can pass the type="quantile" argument in ggaverage, but it estimates the 10th and 90th percentiles (the default). When I try to pass p=0.5 through ggeffects, it gives the error:

Error in sanity_equivalence_p_adjust(equivalence, p_adjust) :
Assertion on 'p_adjust' failed: Must be element of set {'holm','hochberg','hommel','bonferroni','BH','BY','fdr','none'}, but types do not match (numeric != character).

Is there any way to pass that 'p=0.5' through?

Answer 14 · 2024-07-31T09:27:36.000Z

Do you have a reproducible example? Not quite sure what you would like to achieve. Predicted values when time is split at the median?

Answer 15 · 2024-07-31T09:31:19.000Z

Moving to #557 and closing this issue as resolved.