tidylab/usethat

predict distibutions over point estimate

harell opened this issue 4 years ago · 1 comments

harell commented 4 years ago

Outline

Why should I care
Working with distribution objects
Generating a distribution from point estimation
Generating a distribution from point estimation and standard deviation
Generating a distribution from bootstrap sampling
Conclusion

Motivation

Applications, quantile point estimates, scaling
Keeping important information in a succinct form. For example, the normal distribution requires only two parameters, the mean and s.d.
imposing business rules

Working with distribution object

What is a distribution object?
What operations can we perform on a distribution object?
How can we include distribution objects in our current workflow?
- distribution objects can be represented in one column of data.frame
- purrr/dplyr operations on a column (e.g. taking the mean as a point estimate)

Generating a distribution object

Calculating distribution empiricaly
Incorporating prior knowledge, e.g. cars can not have negative weight or the number of gears is a positive integer

Conclusion

Deciding on merely point estimation early in the project life is a premature decision. Moving from distribution to point estimate is one operation away while the opposite direction incurs substantial changes in the project structure.
By default, most learning algorithms (with the exception of fable), return point estimation. Changing the default is an opt-in action.

harell commented 4 years ago

predict distibutions not point estimate