butteR

butteR can be used to smooth out the analysis and visualization of spatial survey data collected using mobile data collection systems (ODK/XLSform). ButteR mainly consists of convenient wrappers and pipelines for the survey, srvyr, sf, and rtree packages.

Installation

You can install the the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("zackarno/butteR")
## Example

Example using the stratified sampler function

The stratified sampler function can be useful if you want to generate random samples from spatial point data. It has been most useful for me when I have shelter footparint data that I want to sample. For now, the function only reads in point data. Therefore, if the footprint data you have is polygons it should first be converted to points (centroids).

I believe the most useful/powerful aspect of this function is the ability to write out well labelled kml/kmz files that can be loaded onto phone and opened with maps.me or other applications. To use this function properly it is important that you first familiarize yourself with some of the theory that underlies random sampling and that you learn how “seeds” can be used/set in R to make random sampling reproducible. The function generates randome seeds and stores it as a an attribute field of the spatial sample. There is also the option to write the seed to the working directory as text file. Understanding how to use the seeds becomes important if you want to reproduce your results, or if you need to do subsequent rounds of sampling where you want to exclude the previous sample without having to read in the previous samples.

To show how the function can be used I will first simulate a spatial data set and sample frame

library(butteR)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.6.1
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
lon<-runif(min=88.00863,max=92.68031, n=1000)
lat<-runif(min=20.59061,max=26.63451, n=1000)
strata_options<-LETTERS[1:8]

#simulate datasets
pt_data<-data.frame(lon=lon, lat=lat, strata=sample(strata_options,1000, replace=TRUE))
sample_frame<-data.frame(strata=strata_options,sample_size=round(runif(10,100,n=8),0))

Here are the first six rows of data for the sample frame and data set

pt_data %>% head() %>% knitr::kable()

lon	lat	strata
90.14262	26.06148	D
91.21273	23.59155	C
90.19238	26.24277	E
90.02332	25.27046	H
89.53342	20.90264	G
88.85128	20.98232	G

sample_frame %>% head() %>% knitr::kable()

strata	sample_size
A	33
B	69
C	39
D	85
E	30
F	16

Next we will run the stratified_sampler function using the two simulated data sets as input.

You can check the function help file by typing ?stratified_sampler. There are quite a few parameters to set particularly if you want to write out the kml file. Therefore, it is important to read the functions documentation (it will be worth it).

sampler_ouput<-butteR::stratified_sampler(sample.target.frame = sample_frame, 
                           sample.target.frame.strata = "strata",
                           sample.target.frame.samp.size = "sample_size",pt.data =pt_data,
                           pt.data.strata = "strata",pt.data.labels = "strata" ,write_kml = FALSE 
                            )

The output is stored in a list. Below is the first 6 results of each stratified sample. The results are stratified sample. They can be viewed collectively or one at a time.

sampler_ouput$results %>% purrr:::map(head) %>% knitr::kable()

Description	rnd_seed	uuid
1_A	828005	27
2_A	828005	68
3_A	828005	83
4_A	828005	100
5_A	828005	101
6_A	828005	124

Description	rnd_seed	uuid
1_B	828005	10
2_B	828005	41
3_B	828005	44
4_B	828005	62
5_B	828005	69
6_B	828005	92

Description	rnd_seed	uuid
1_C	828005	2
2_C	828005	32
3_C	828005	36
4_C	828005	45
5_C	828005	110
6_C	828005	138

Description	rnd_seed	uuid
1_D	828005	1
2_D	828005	12
3_D	828005	13
4_D	828005	17
5_D	828005	28
6_D	828005	51

Description	rnd_seed	uuid
1_E	828005	33
2_E	828005	50
3_E	828005	66
4_E	828005	87
5_E	828005	109
6_E	828005	146

Description	rnd_seed	uuid
1_F	828005	135
2_F	828005	153
3_F	828005	317
4_F	828005	381
5_F	828005	402
6_F	828005	462

Description	rnd_seed	uuid
1_G	828005	5
2_G	828005	6
3_G	828005	14
4_G	828005	19
5_G	828005	20
6_G	828005	25

Description	rnd_seed	uuid
1_H	828005	23
2_H	828005	24
3_H	828005	30
4_H	828005	49
5_H	828005	75
6_H	828005	85

sampler_ouput$results$D %>% head()
#>   Description rnd_seed uuid
#> 1         1_D   828005    1
#> 2         2_D   828005   12
#> 3         3_D   828005   13
#> 4         4_D   828005   17
#> 5         5_D   828005   28
#> 6         6_D   828005   51

The random_seed is saved in the list as well as an attribute of each stratified sample. The random seed is very important for reproducibility which is quite useful for subsequent rounds of data collection

sampler_ouput$random_seed 
#> [1] 828005

You can also view all of the remaining points which were not not randomly sampled. You can choose to have these written to a shape file. It is generally a good back up policy to write these as well.

sampler_ouput$samp_remaining %>% head() %>% knitr::kable()

	lon	lat	strata	uuid	rnd_seed
3	90.19238	26.24277	E	3	828005
4	90.02332	25.27046	H	4	828005
7	90.77956	25.45381	E	7	828005
8	90.88944	22.56836	G	8	828005
9	90.76433	21.99042	A	9	828005
11	90.83148	25.57179	E	11	828005

Example using the check_distance_from_target function

First I will generate 2 fake point data sets. The sf package is great!

library(sf)

set.seed(799)
lon1<-runif(min=88.00863,max=92.68031, n=1000)
lat1<-runif(min=20.59061,max=26.63451, n=1000)
lon2<-runif(min=88.00863,max=92.68031, n=1000)
lat2<-runif(min=20.59061,max=26.63451, n=1000)
strata_options<-LETTERS[1:8]

#make a simulated dataset
pt_data1<-data.frame(lon=lon1, lat=lat1, strata=sample(strata_options,1000, replace=TRUE))
pt_data2<-data.frame(lon=lon2, lat=lat2, strata=sample(strata_options,1000, replace=TRUE))

# convert to simple feature object
coords<- c("lon", "lat")
pt_sf1<- sf::st_as_sf(x = pt_data1, coords=coords, crs=4326)
pt_sf2<- sf::st_as_sf(x = pt_data2, coords=coords, crs=4326)

Next I will show two spatial verification functions. The first one just finds the closest distance between points. It uses rTree spatial indexing so it will work quickly on fairly large datasets.

closest_pts<- butteR::closest_distance_rtree(pt_sf1, pt_sf2)
#> Warning in rtree::knn.RTree(rTree = sf2_tree, st_coordinates(sf1)[,
#> c("X", : k was cast to integer, this may lead to unexpected results.

closest_pts %>% head() %>% knitr::kable()

	strata	geometry	strata.1	geometry.1	dist_m
755	C	c(88.5246591396806, 26.0766159565661)	H	c(88.542828683707, 25.8766529368377)	22228.020
798	C	c(91.3460825806255, 22.3494960887145)	F	c(91.3754625593381, 22.3643193468922)	3442.702
464	C	c(91.6884048353551, 26.0950136747809)	B	c(91.6959527733822, 26.0490176807472)	5151.514
902	B	c(88.782772209299, 22.2289078448025)	C	c(88.812609722456, 22.2312796777867)	3087.283
199	B	c(91.9385484030803, 22.9929798167442)	A	c(92.0439420932042, 22.9314622797974)	12776.161
419	D	c(88.6396377435045, 22.2862520419468)	C	c(88.7253538271838, 22.3836231110146)	13936.767

You could easily just filter the “closest_pts” ouput by a distance threshold of your choice. However to make it simpler I have wrapped this function in the function “check_distances_from_target” (I need to come up with a better name for this function). It will return all of the points in from “dataset”that are further than the set threshold from any point in the “target_points”. It will also show you the distance to the closest target point. Obviously this is fake data so there are a ton of points returned (I will just display the first 6 rows). In your assessment dat there should obviously be much less.

set.seed(799)
pts_further_than_50m_threshold_from_target<-
  butteR::check_distances_from_target(dataset = pt_sf1,target_points =pt_sf2,dataset_coordinates = coords,
                                      cols_to_report = "strata", distance_threshold = 50)
#> Warning in rtree::knn.RTree(rTree = sf2_tree, st_coordinates(sf1)[,
#> c("X", : k was cast to integer, this may lead to unexpected results.


pts_further_than_50m_threshold_from_target %>% head() %>% knitr::kable()

strata	dist_m
C	22228.020
C	3442.702
C	5151.514
B	3087.283
B	12776.161
D	13936.767

caldwellst/butteR

butteR

Installation

Example using the stratified sampler function

Example using the check_distance_from_target function