bcgov/bcmaps

Download LiDAR tiles from LidarBC

Opened this issue ยท 36 comments

There are three types of products:

  1. Raw Point Cloud: .laz files
  2. Digital Elevation Model (DEM): 1m and 10m as .tif files
  3. Digital Surface Model (DSM): .laz files

A first step to parse the files and plot things up

Outstanding issues:

  • The geotifs have a CRS but the LAZ files do not, LAZ are in UTM, so need to know the tile to be able to know the correct CRS (EDIT: UTM Zone is in the LAZ filename...)
  • Need to make some search functions using the tiles
  • Would be nice to be able to query the available tiles (like a tile index sf object)
library(tidyverse)
library(RCurl)
#> 
#> Attaching package: 'RCurl'
#> The following object is masked from 'package:tidyr':
#> 
#>     complete
library(httr)
library(XML)
library(lidR)
#> Loading required package: raster
#> Loading required package: sp
#> 
#> Attaching package: 'raster'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
#> The following object is masked from 'package:tidyr':
#> 
#>     extract
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1

# XML Location 

url <- "https://nrs.objectstore.gov.bc.ca/gdwuts/"

# Parse XML 

my_xml <- XML::xmlToList(XML::xmlParse(RCurl::getURL(url)))

# Format XML 

my_xml_list <- my_xml %>% 
  unlist(recursive = FALSE) %>% 
  enframe() %>% 
  filter(name == "Contents.Key") %>% 
  mutate(value = unlist(value)) 

# File list 
tif_list <- my_xml_list[str_detect(my_xml_list$value, ".tif"),]
laz_list <- my_xml_list[str_detect(my_xml_list$value, ".laz"),]

#### TIF ####

file <- tif_list[200,2]$value
fileurl <- paste0(url, file)
local_file <- paste0(gsub("/","_",file))
download.file(url = fileurl, destfile = local_file, mode="wb")
tif <- stars::read_stars(local_file) 
tif %>% plot()
#> downsample set to c(2,2)

#### LAZ ####

file <- laz_list[200,2]$value
fileurl <- paste0(url, file)
local_file <- paste0(gsub("/","_",file))
download.file(url = fileurl, destfile = local_file, mode="wb")

las = lidR::readLAS(local_file)
projection(las) <- sf::st_crs(tif)
dtm <- grid_terrain(las, res = 1, knnidw(k=10,p=2))
#> Warning: There were 13 degenerated ground points. Some X Y Z coordinates were
#> repeated. They were removed.
#> Warning: There were 5 degenerated ground points. Some X Y coordinates were
#> repeated but with different Z coordinates. min Z were retained.
plot(dtm)

Amazing @bevingtona!

Hmm XML is only reading the first 1000 lines... somthing wrong in:

my_xml <- XML::xmlToList(XML::xmlParse(RCurl::getURL(url)))

image

These are stored in an S3 bucket, so can use the {aws.s3} package:

library(aws.s3)
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1
#> Registered S3 methods overwritten by 'stars':
#>   method             from
#>   st_bbox.SpatRaster sf  
#>   st_crs.SpatRaster  sf

bucket_092bdem <- get_bucket("gdwuts", 
                   prefix = "092/092b/2019/dem", 
                   base_url = "nrs.objectstore.gov.bc.ca", 
                   region = "")

bucket_df <- as.data.frame(bucket_092bdem)

file <- save_object(grep("092b053", bucket_df$Key, value = TRUE), 
                   bucket = "gdwuts", 
                   base_url = "nrs.objectstore.gov.bc.ca", 
                   region = "")

plot(read_stars(file))
#> downsample set to c(27)

Created on 2021-11-18 by the reprex package (v2.0.1)

Very nice @ateucher !
Do you know of a way to query the available files via aws.s3?

As far as I can see there is get_bucket where you can list objects starting with a prefix... The S3 documentation is here: https://docs.aws.amazon.com/s3/index.html, and it looks like {aws.s3} wraps most of that functionality.

It looks like there are many tens of thousands of files in that bucket, so listing them all isn't really feasible - I think searching/prefixing is the right way to go

Feels like we can approach this the same way as the CDED data (with the exception of local storage) by just giving a function some spatial data and then returning (using prefixes and maptiles) to return the lidar data. @bevingtona would you use a vrt to similarly stitch together a few lidar files?

@boshek
For the existing .tif files available, yes the .vrt approach would work well

But the thing about lidar is that there are a great many data products possible from the .las file

So I suppose we need to decide what bcmaps wants to "offer"

  1. Download and mosaic geotiffs for an AOI
  2. Download las/laz files
  3. Process las/laz files into DTM, DSM, CHM, hillshade, etc.?

Is this just leading to a new package? bclidar? where one can plot the index files and download the tif and or the las and then even have some vignettes for processing using the lidR and rlas packages?

+1

I think so too

@ateucher @boshek let me know when it's done ;)

jk, should we close the issue and start fresh in a new repo? I think cded can stay in bcmaps.. it plays well as a "wall to wall" dataset for BC..

You can transfer an issue from one repo to another as well, that might be helpful

+1 for {bclidar}

Totally! This package can definitely has bcdata as a dependency making retrieval pretty robust to backend changes (๐Ÿคž )

Some more experimentation

library(esri2sf)
library(mapview)
library(bcmaps)
library(dplyr)
library(sf)

# AOI
aoi <- bcmaps::bc_cities() %>% filter(NAME == "Vancouver") %>% 
  st_buffer(5000) %>% 
  st_transform(4326) # NEEDS TO BE EPSG:4326 

mapview(aoi)

# LIST OF DATASETS
lidar_extent     <- "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/0"
lidar_dsm_02500k <- "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/1"
lidar_dsm_20000k <- "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/2"
lidar_pointcloud <- "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/3"
lidar_dem_02500k <- "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/4"
lidar_dem_20000k <- "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/5"

# QUERY DATASET
tiles <- esri2sf(lidar_dem_20000k, bbox = aoi %>% st_bbox())
#> Layer Type: Feature Layer
#> Geometry Type: esriGeometryPolygon
#> Service Coordinate Reference System: PROJCS["PCS_Albers",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Albers"],PARAMETER["False_Easting",1000000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",-126.0],PARAMETER["Standard_Parallel_1",50.0],PARAMETER["Standard_Parallel_2",58.5],PARAMETER["Latitude_Of_Origin",45.0],UNIT["Meter",1.0]]
#> Output Coordinate Reference System: 4326

mapview(aoi) + mapview(tiles)

# DOWNLOAD
fol <- tempdir()

for(i in 1:nrow(tiles)){
  print(i)
  tile <- tiles[i,]
  file <- tile$filename
  url <- tile$s3Url
  httr::GET(url, 
            httr::write_disk(paste(fol,file,sep="/"), 
                             overwrite=TRUE))}
#> [1] 1
#> [1] 2

l <- list.files(fol, pattern = "bc_", full.names = T)

sf::gdal_utils(util = "buildvrt", 
               source = l, 
               destination = "my.vrt")

stars::read_stars("my.vrt", proxy = T) %>% 
  mapview(maxpixels = 10000) + mapview(tiles)
#> Number of pixels is above 10000.Only about 10000 pixels will be shown.
#> You can increase the value of `maxpixels` to 323986185 to avoid this.

Created on 2021-11-26 by the reprex package (v2.0.0)

Searching for a list of available lidar tiles brought me to this thread before I could find the index.

+1 for {bclidar} as well. Would love to see a command line interface to it as well if possible.

Just wondering what the status of this functionality is - is this going to be written into the bcmaps package, or is bclidar being actively developed? I tried searching for that package name on Google and GitHub but being outside of government I wasn't able to find anything. Thanks all!

No progress from me! Just the prototype above.. anyone else made any strides? @ateucher @stephhazlitt @boshek @HunterGleason @SashaNasonova

I'm afraid not over here either...

It is an open issue here too. smnorris/bcdata#107 - has not been a priority so far.

Not sure a 'bulk download' function is the best approach. Still ends up being quite a bit of processing locally. Wouldn't the better approach be to host the data as Cloud Optimized GeoTIFF and as Cloud Optimized Point Clouds?

I'm sure you're right @bevingtona, unfortunately the hosting question is out of scope for bcmaps!

I think the DEM TIFFs should be valid COGs already? I wasn't involved with the final publication/conversion but evaluated possible formats / compression and we pushed for COG, and I think with DEFLATE.

$ rio cogeo validate bc_092b091_xl1m_utm10_2019.tif
The following warnings were found:
- The file is greater than 512xH or 512xW, it is recommended to include internal overviews

/Users/snorris/Downloads/bc_092b091_xl1m_utm10_2019.tif is a valid cloud optimized GeoTIFF

@ateucher i guess this might be out of scope for bcmaps, but I think if the storage is solved, then bcmaps or bcdata would have associated functions to access the data. So in my mind, not sure where else to bring this conversation.

@smnorris good point! So either a SpatioTemporal Asset Catalog (STAC) of the COG tiles is needed, or stitch the lidar tiles into mosaics per project and serve them as COGs, then the user can query the bounding box etc in the HTTP request... or simply add the COG path into QGIS for example.

Not sure the best path forward.. but cool that these are COGs! I didn't realize!

image

Thanks for the discussion! I know this isn't being prioritized, but I'll throw it out there and say that there is interest in a simple function that can accomplish downloading available LiDAR of a defined area in a similar manner to the cded functions when downloading TRIM elevation. Even if it starts out with @bevingtona's prototype of bulk downloading tiles, as a user I would be okay with that for now, and then perhaps later on changing the function to accomplish what you hope for using STAC's (that is beyond my understanding but hopefully it's not too difficult for you to figure out!).

Thanks all for the wonderful work you all do :)

Here is an example of what this could look like in bcmaps.
I am not the maintainer of the package, and not sure if this should be it's own bclidar package.
And I don't have a great grasp of package architecture... so that is why I'm not making a pull request...

Here is what the final product could look like (functions are below this example):

aoi <- bcmaps::bc_cities() %>% 
  filter(NAME == "Vancouver") %>% 
  st_buffer(5000)

# Show how many tiles of each layer are in our AOI
bclidar_search_aoi(aoi = aoi)
#>       layer format n_tiles
#> 1    extent    shp       1
#> 2  dsm_2500    laz      58
#> 3 dsm_10000    laz       0
#> 4  laz_2500    laz      58
#> 5  dem_2500    tif       0
#> 6 dem_20000    tif       2

# Get an `sf` object of the tiles in your AOI for a specific layer
my_tiles <- bclidar_tif_get_tiles(aoi = aoi, layer = "dem_20000")
#> [1] "FOUND 2 TILES"

# Download and plot! 
r <- bclidar_tif_get_data(tiles = my_tiles, 
                     out_dir = tempdir(), 
                     overwrite = F, 
                     read_r = T, 
                     read_r_format = "stars")
#> [1] "Downloading 1 of 2"
#> [1] "Downloading 2 of 2"
#> [1] "Downloads complete"
#> [1] "Process complete"

plot(r)
#> downsample set to 30

Created on 2022-11-09 by the reprex package (v2.0.0)

Here are the functions that are used above:

library(esri2sf)
library(sf)
library(mapview)
library(dplyr)
library(httr)
library(stars)
library(terra)
bclidar_get_layers <- function(){
  data.frame(
    name = c("extent",
             "dsm_2500",
             "dsm_10000",
             "laz_2500",
             "dem_2500",
             "dem_20000"),
    format = c("shp",
               "laz",
               "laz",
               "laz",
               "tif",
               "tif"),
    url = c("https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/0",
            "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/1",
            "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/2",
            "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/3",
            "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/4",
            "https://services6.arcgis.com/ubm4tcTYICKBpist/arcgis/rest/services/LiDAR_BC_S3_Public/FeatureServer/5"))}
bclidar_search_aoi <- function(aoi){
  
  bclidar_layers <- bclidar_get_layers()
  
  do.call(bind_rows, lapply(1:nrow(bclidar_layers), function(i){
    name <- bclidar_layers[i,"name"]
    format <- bclidar_layers[i,"format"]
    url <- bclidar_layers[i,"url"]
    tiles <- esri2sf(url, bbox = st_bbox(aoi), )
    tilesize = nrow(tiles)
    data.frame(layer=name, 
               format=format, 
               n_tiles=tilesize)
    }))}
bclidar_tif_get_tiles <- function(aoi = aoi, 
                                layer = "dem_20000"){

  bclidar_layers <- bclidar_get_layers()
  
  url <- bclidar_layers[bclidar_layers$name == layer,"url"]
  
  tiles <- esri2sf(url, bbox = st_bbox(aoi))
    
  if(nrow(tiles)==0){
    print("NO TILES")}else{
    print(paste("FOUND",nrow(tiles),"TILES"))
      tiles}
}
bclidar_tif_get_data <- function(tiles = my_tiles, 
                                 out_dir = tempdir(),
                                 overwrite = F,
                                 read_r = T,
                                 read_r_format = "terra"){
  
  if(is.null(tiles[1,]$filename)){
    stop("No filename in layer")}
  
  if(length(grep("tif",tiles[1,]$filename))==0){
    stop("This tile does not contain a tif.")}
  
  lapply(1:nrow(tiles), function(i){
    
    print(paste("Downloading", i, "of", nrow(tiles)))
    tile <- tiles[i,]
    file <- tile$filename
    url <- tile$s3Url
    
    if(length(list.files(out_dir, pattern = file))==1&overwrite==F){
      print("File exists, overwrite set to FALSE, skipping to next")}else{
      httr::GET(url, 
                httr::write_disk(paste(out_dir,file,sep="/"), 
                                 overwrite=overwrite))
      if(i == nrow(tiles)){print("Downloads complete")}
        }
    
    })
  
  
  if(read_r == T){  
  vrt_name=paste0(out_dir,"/bclidar_download_",gsub(" ","_",gsub("-|:","",Sys.time())),".vrt")
    sf::gdal_utils(util = "buildvrt", 
                   source = paste0(out_dir,"/",tiles$file), 
                   destination = vrt_name)
    
    if(read_r_format == "stars"){
      out_img <- stars::read_stars(vrt_name, proxy = T)}
    
    if(read_r_format == "terra"){
      out_img <- terra::rast(vrt_name)}
    
   print("Process complete") 
   out_img
  }
  }
  

@bevingtona That looks amazing. I would be happy to push the skeleton of an R package to a new bcgov repo (bclidar) to support you getting going with the above code, assuming this would be another bcgov R package. I think you or @ateucher would need to open the bare repo and assign me as a collaborator though. You could be the creator and maintainer and decide at a later date how the package matures (e.g. destined for CRAN etc).

@bevingtona any plan re: advancing this in a stand-alone {bclidar} package? I would be happy to contribute to get things set up, and you can plunk your script in for a great start to a package.

@stephhazlitt Thanks for the reminder! I haven't made any progress on this.. The functions above work for batch downloading, but don't fully exploit the "cloud optimized geotiff" functionality.. We can certainly make the above functions into a package and see where it goes.

I think the fact that there is a Lidar Open Data portal (distinct from the BC Data Catalogue) is perhaps another rationale for a standalone R package: https://governmentofbc.maps.arcgis.com/apps/MapSeries/index.html?appid=d06b37979b0c4709b7fcf2a1ed458e03.

And see this news re: bcgov "investing more than $38 million in a new program over the next six years to collect light distance and ranging (LiDAR) elevation data" https://news.gov.bc.ca/releases/2023WLRS0010-000512

That program is great news.

I still haven't worked with a STAC - but do wonder if existing libraries would be good enough if the data were to be indexed as a STAC. Poking GeoBC (presuming they continue to do the publishing) about that might be worth it before building something from scratch.

https://rdrr.io/cran/rstac/man/rstac.html
https://r-spatial.org/r/2021/04/23/cloud-based-cubes.html
https://pystac-client.readthedocs.io/en/stable/quickstart.html#cli