lulcc

lulcc provides a framework for spatially explicit land use change modelling in r. The long term goal of lulcc is to provide a smart and tidy interface to running the standard land use change modelling in 4 steps: raster data prepping, probability surface generation, allocation and validation, in one tidy package.

Installation

You can install the released version of lulcc from CRAN with:

install.packages("lulcc")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("simonmoulds/lulcc")

The lulcc workflow

Adapted from https://www.geosci-model-dev.net/8/3215/2015/

The package includes two example datasets: one for Sibuyan Island in the Phillipines and one for the Plum Island Ecosystem in Massachusetts, United States. Here we present a complete working example for the Plum Island Ecosystem dataset.

1. Raster data preparation

Land use change modelling requires a large amount of input data. The most important input is at least one map of observed land use. In lulcc, this data is represented by the ObsLulcRasterStack class:

library(lulcc)
#> Loading required package: raster
#> Loading required package: sp
data(pie)
obs <- ObsLulcRasterStack(x=pie,
                          pattern="lu",
                          categories=c(1,2,3),
                          labels=c("Forest","Built","Other"),
                          t=c(0,6,14))

A useful starting point in land use change modelling is to obtain a transition matrix for two observed land use maps to identify the main transitions. This can be achieved with the crossTabulate function:

# obtain a transition matrix from land use maps for 1985 and 1991
crossTabulate(obs, times=c(0,6))
#>        Forest Built Other
#> Forest  46672  1926   415
#> Built       0 37085    37
#> Other     359  1339 25730

For the Plum Island Ecosystem site this reveals that the main transition was from forest to built areas.

The next stage is to relate observed land use or observed land use transitions to spatially explicit biophysical or socioeconomic explanatory variables. These are loaded as follows:

ef <- ExpVarRasterList(x=pie, pattern="ef")

2. Probability surface modelling

To fit predictive models we first divide the study region into training and testing partitions. The partition function returns a list with cell numbers for each partition:

part <- partition(x=obs[[1]],
                  size=0.1, spatial=TRUE)

We then extract cell values for the training and testing partitions.

# extract training data
train.data <- getPredictiveModelInputData(obs=obs,
                                          ef=ef,
                                          cells=part[["train"]],
                                          t=0)

test.data <- getPredictiveModelInputData(obs=obs,
                                         ef=ef,
                                         cells=part[["test"]])

Predictive models are represented by the PredictiveModelList class. For comparison, we create a PredictiveModelList object for each type of predictive model:

# fit models (note that a predictive model is required for each land use category)
forms <- list(Built~ef_001+ef_002+ef_003,
              Forest~ef_001+ef_002,
              Other~ef_001+ef_002)

# generalized linear model models
glm.models <- glmModels(formula=forms,
                        family=binomial,
                        data=train.data,
                        obs=obs)
#> Warning in if (is.na(categories) | is.na(labels)) {: the condition has length >
#> 1 and only the first element will be used
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# recursive partitioning and regression tree models
rpart.models <- rpartModels(formula=forms,
                            data=train.data,
                            obs=obs)
#> Warning in if (is.na(categories) | is.na(labels)) {: the condition has length >
#> 1 and only the first element will be used
# random forest models (WARNING: takes a long time!)
rf.models <- randomForestModels(formula=forms,
                                data=train.data,
                                obs=obs)
#> Warning in if (is.na(categories) | is.na(labels)) {: the condition has length >
#> 1 and only the first element will be used
#> Warning in randomForest.default(m, y, ...): The response has five or fewer
#> unique values. Are you sure you want to do regression?

#> Warning in randomForest.default(m, y, ...): The response has five or fewer
#> unique values. Are you sure you want to do regression?

#> Warning in randomForest.default(m, y, ...): The response has five or fewer
#> unique values. Are you sure you want to do regression?

We can then use the fitted models to predict over the full data set and produce the probability surfaces for each fitted model:

all.data <- as.data.frame(x=ef, obs=obs, cells=part[["all"]])

# GLM
probmaps <- predict(object=glm.models,
                    newdata=all.data,
                    data.frame=TRUE)
points <- rasterToPoints(obs[[1]], spatial=TRUE)
probmaps <- SpatialPointsDataFrame(points, probmaps)
probmaps <- rasterize(x=probmaps, y=obs[[1]],
                      field=names(probmaps))
rasterVis::levelplot(probmaps)

Model performance is assessed using the receiver operator characteristic provided by the ROCR package. lulcc includes classes Prediction and Performance which extend the native ROCR classes to contain multiple prediction and performance objects. The procedure to obtain these objects and assess performance is as follows:

glm.pred <- PredictionList(models=glm.models,
                           newdata=test.data)
glm.perf <- PerformanceList(pred=glm.pred,
                            measure="rch")
rpart.pred <- PredictionList(models=rpart.models,
                             newdata=test.data)
rpart.perf <- PerformanceList(pred=rpart.pred,
                              measure="rch")
rf.pred <- PredictionList(models=rf.models,
                          newdata=test.data)
rf.perf <- PerformanceList(pred=rf.pred,
                           measure="rch")
plot(list(glm=glm.perf,
          rpart=rpart.perf,
          rf=rf.perf))

Another use of ROC analysis is to assess how well the models predict the cells in which gain occurs between two time points. This is only possible if a second observed land use map is available for a subsequent time point. Here we perform this type of analysis for the gain of built between 1985 and 1991. First, we create a data partition in which cells not candidate for gain (cells belonging to built in 1985) are eliminated. We then assess the ability of the various predictive models to predict the gain of built in this partition:

part <- rasterToPoints (obs[[1]],
                        fun=function(x) x != 2,
                        spatial=TRUE)
test.data<- getPredictiveModelInputData(obs=obs,
                                        ef=ef,
                                        cells=part,
                                        t=6)
glm.pred <- PredictionList(models=glm.models[[2]],
                           newdata=test.data)
glm.perf <- PerformanceList(pred=glm.pred,
                            measure="rch")
plot(list(glm=glm.perf))

3. Allocation

Spatially explicit land use change models are usually driven by non-spatial estimates of land use area for each timestep in the simulation. While many complex methods have been devised, in lulcc we simply provide a method for linear extrapolation of land use change, which relies on there being at least two observed land use maps:

# obtain demand scenario
dmd <- approxExtrapDemand(obs=obs, tout=0:14)
#> Warning in total(x = obs): missing argument 'categories': getting categories
#> from 'x'

We then use a filter defined as a matrix within the NeighbRasterStack function to gather neighbor data from the land use change data.

w <- matrix(data=1, nrow=3, ncol=3)
nb <- NeighbRasterStack(x=obs[[1]], weights=w,
                        categories=c(1,2,3))

The culmination of the modelling process is to simulate the location of land use change. lulcc provides a routine based on the CLUE-S model (Verburg et al., 2002) and a novel stochastic allocation procedure (with option for using the ordered method). The first step is to combine the various model inputs to ensure they are compatible:

clues.rules <- matrix(data=1, nrow=3, ncol=3)
clues.parms <- list(jitter.f=0.0002,
                    scale.f=0.000001,
                    max.iter=1000,
                    max.diff=50,
                    ave.diff=50)
clues.model <- CluesModel(obs=obs,
                          ef=ef,
                          models=glm.models,
                          time=0:14,
                          demand=dmd,
                          elas=c(0.2,0.2,0.2),
                          rules=clues.rules,
                          params=clues.parms)
ordered.model <- OrderedModel(obs=obs,
                              ef=ef,
                              models=glm.models,
                              time=0:14,
                              demand=dmd,
                              order=c(2,1,3))

Then, finally, we can perform allocation:

clues.model <- allocate(clues.model)
ordered.model <- allocate(ordered.model, stochastic=TRUE)

4. Validation

An important yet frequently overlooked aspect of land use change modelling is model validation. lulcc provides a recent validation method developed by Pontius et al. (2011), which simultaneously compares a reference (observed) map for time 1, a reference map for time 2 and a simulated map for time 2. The first step in this method is to calculate three dimensional contingency tables:

# evaluate CLUE-S model output
clues.tabs <- ThreeMapComparison(x=clues.model,
                                   factors=2^(1:8),
                                   timestep=14)

From these tables we can easily extract information about different types of agreement and disagreement as well as compute summary statistics such as the figure of merit:

clues.agr <- AgreementBudget(x=clues.tabs)
plot(clues.agr, from=1, to=2)

clues.fom <- FigureOfMerit(x=clues.agr)
plot(clues.fom, from=1, to=2)