StatMixedML/LightGBMLSS

Prediction of quantiles for parametric distributions

ninist opened this issue · 5 comments

ninist commented

For parametric distributions, I understand the model learns the mapping from data to parameters, so the model predicts the parameters for each observation.

When quantiles are predicted, the model first samples from the parametric distribution with those parameters, and then constructs quantiles from those drawn samples:
https://github.com/StatMixedML/LightGBMLSS/blob/master/lightgbmlss/distributions/distribution_utils.py#L415-L421

Is it infeasible or a reason to not instead use the theoretical quantiles for those fitted parameters directly, without sampling?

Essentially using torch_dist.icdf:

1) Fit the distribution with lightgbmlss.

2) Predict parameters for some dataset

    pred_params = distribution.predict(
        test_X,
        pred_type="parameters",
    )

3) Take the fitted parameters and instantiate a torch distribution object.

def instantiate_torch_distribution(distribution, pred_params: pd.DataFrame):
    """Prepare a torch distribution object (tensor) for the observations in the test set

    distribution
        This is an object of type lightgbmlss.model.LightGBMLSS, for which a mapping from
        features to distributional parameters was learned.
    pred_params:
        A dataframe with one column per predicted distributional param, and one row
        for each test observation that had that param predicted.
    """

    # A list of named parameters, e.g. ['loc', 'scale'] that the torch distribution
    # expects for instantiation.
    dist_arg_names = distribution.dist.distribution_arg_names

    preds_transformed = []
    for dist_arg_name in dist_arg_names:
        # predict() above already returns the values on the transformed scale
        # (e.g. the model predicts log_variance and then transforms it via exp(log_variance)
        # and this is the result from predict() above.
        pred_tensor = torch.tensor(
            pred_params[dist_arg_name].values,
            dtype=torch.float64,
        ).reshape(-1, 1)
        preds_transformed.append(pred_tensor)
    
    dist_kwargs = dict(zip(dist_arg_names, preds_transformed))
    torch_dist_fit = distribution.dist.distribution(**dist_kwargs)
    return torch_dist_fit

4) Use the torch distribution object to get the theoretical quantiles for the given parameters

def get_quantiles(q, torch_dist_fit, test_y):
    """get the qth quantiles of target values test_y

    q in [0.0, 1.0]
    """
    qt = torch.tensor([q]*test_y.size).reshape(-1, 1)
    quants=torch_dist_fit.icdf(qt)
    return quants

Misc

Separately from the above, it also looks like the samples are drawn when predict is called, even if the samples are not used in any of the returns, specifically pred_type == "parameters" and pred_type == "expectiles":
https://github.com/StatMixedML/LightGBMLSS/blob/master/lightgbmlss/distributions/distribution_utils.py#L402-L410

I can see how it makes the code cleaner if not littered with conditionals for whether to sample, and perhaps the extra computations from sampling without returning the samples is insignificant.

ninist commented

Same question for sampling to find the predicted density, when the parametric form has been assumed and the parameters of this distribution have been estimated for each observation.

If the distribution is entirely defined by the parameters, is there a reason to not use this form directly?

ninist commented

I see now quite a few torch distributions do not have an implementation of (a numerical approximation of) the icdf method. Probably the motivation behind sampling to estimate?

A part which confused me was the plots of the predicted densities in the gaussian example looked asymmetric and "weirdly" shaped https://github.com/StatMixedML/LightGBMLSS/blob/master/docs/examples/Gaussian_Regression.ipynb

Using larger sample sizes mitigate this somewhat, and this might all have been obvious to most readers/users, though the terseness was a hurdle for me.

@ninist Thanks for your comments.

I see now quite a few torch distributions do not have an implementation of (a numerical approximation of) the icdf method. Probably the motivation behind sampling to estimate?

Yes that is exactly right. Some distributions have no built-in icdf. So I decided to go with the sampling approach for all of them.

Using larger sample sizes mitigate this somewhat, and this might all have been obvious to most readers/users, though the terseness was a hurdle for me.

Indeed, increasing the sample size makes the densities more "Gaussian".

@ninist Can I close this?

I think so, yes.