tidylab/usethat

predict distibutions over point estimate

harell opened this issue · 1 comments

Outline

  • Why should I care
  • Working with distribution objects
  • Generating a distribution from point estimation
  • Generating a distribution from point estimation and standard deviation
  • Generating a distribution from bootstrap sampling
  • Conclusion

Motivation

  • Applications, quantile point estimates, scaling
  • Keeping important information in a succinct form. For example, the normal distribution requires only two parameters, the mean and s.d.
  • imposing business rules

Working with distribution object

  • What is a distribution object?
  • What operations can we perform on a distribution object?
  • How can we include distribution objects in our current workflow?
    • distribution objects can be represented in one column of data.frame
    • purrr/dplyr operations on a column (e.g. taking the mean as a point estimate)

Generating a distribution object

  • Calculating distribution empiricaly
  • Incorporating prior knowledge, e.g. cars can not have negative weight or the number of gears is a positive integer

Conclusion

  • Deciding on merely point estimation early in the project life is a premature decision. Moving from distribution to point estimate is one operation away while the opposite direction incurs substantial changes in the project structure.
  • By default, most learning algorithms (with the exception of fable), return point estimation. Changing the default is an opt-in action.