Add guidance text for methods/data
Opened this issue · 7 comments
Starting this to keep track of bits of guidance text that I can't write myself:
- Why do we need an aggregation raster / does it do?
- When would it be appropriate to use a uniform raster?
- What is the mesh and why is it needed?
- What the mesh settings change and when would you want to change them?
- What is the difference between the model families and how should you decide which one to use?
- What is the difference between the model link functions and how should you decide which one to use?
- What is a spatial field and when should it be used / not used?
- What are IID and when should it be used / not used?
- What are priors and what difference will setting them make?
- Explanation of the options for
predict_uncertainty
Why do we need an aggregation raster / does it do?
We want to model on rates. 1 degree temperature increases the rate by 0.1 per person, not by 0.1.
In this sense it is working as an "offset" in a Poisson GLM.
It can also be though of as a weight but not sure if this is confusing
aggregation is essentially weighting of the pixels in the model, but included in the model at fitting rather than just converting incidence to rate
When would it be appropriate to use a uniform raster?
Each pixel could contribute to the response equally. e.g. average air pollution over the polygon - no way that you can know where in the polygon the sources are. Or average surveys which where completely random e.g. diversity of pollen in 10 cores taken at random
What is the mesh and why is it needed?
https://sites.stat.washington.edu/peter/591/Lindgren.pdf (1.1. Continuous domain spatial Markov random fields
)
https://sites.google.com/a/r-inla.org/www/spde-book?authuser=0
SPDE is the keyword
What is a spatial field and when should it be used / not used?
Spatial field is a completely flexible (non-parametric) 2D contributor to the linear predictor. It is a way of modeling missing covariates that are spatially structured. In other geospatial modelling tasks it is often used to model biased data collection e.g. due to oversampling.
Normally use it, only don't if you want to run faster or you know you have all the important covariates.
Continuous across the area and spatially structured c/f the IID effect on a polygon level.
authoritative: https://link.springer.com/book/10.1007/978-0-387-48536-2
IID more likey due to be caused by data collection or that when outbreaks occur they spread rapidly leading to high values in a particular polygon but that doesn't necessarily mean the risk is much higher. Non-spatial implying that it makes no difference to the neighbouring polygons.