lidar data with missing labels
Opened this issue · 2 comments
Hi Ben,
I investigated the lidar data with regards to the labels and I found some point clouds without any labels and some that are missing just a few annotations. The plots are listed at the end of the reprex below:
library(tidyverse)
plot_names <- NeonTreeEvaluation::list_annotations()
# Loop over plot names and investigate the point cloud labels
lidar_label_data <- map_dfr(plot_names, function(plot_name) {
# cat("Plot [", which(plot_name == plot_names), "/", length(plot_names), "] \"",
# plot_name, "\"... ", sep = "")
# Get file path of possible annotation data
annotation_file_path <- system.file(
"extdata", "NeonTreeEvaluation", "annotations",
package = "NeonTreeEvaluation"
) %>%
file.path(plot_name) %>%
paste0(".xml")
# Get file paths of possible lidar data
lidar_file_paths <- system.file(
"extdata", "NeonTreeEvaluation", "evaluation", "LiDAR",
package = "NeonTreeEvaluation"
) %>%
file.path(plot_name) %>%
paste0(c(".las", ".laz"))
if (
!file.exists(annotation_file_path) ||
!any(file.exists(lidar_file_paths))
) {
# cat("skipped.\n", sep = "")
return()
}
# Get the lidar data
lidar_file_path <- lidar_file_paths[file.exists(lidar_file_paths)][[1]]
point_cloud <- suppressWarnings(lidR::readLAS(lidar_file_path))
# create a one-row table with some data
if ("label" %in% colnames(point_cloud@data)) {
res <- tibble(
plot_name,
has_labels = TRUE,
num_unique_labels = point_cloud@data %>%
filter(!is.na(label), label != 0) %>%
pull(label) %>%
unique() %>%
length(),
num_annotations = NeonTreeEvaluation::get_data(plot_name, "annotations") %>%
NeonTreeEvaluation::xml_parse() %>%
nrow()
)
} else {
res <- tibble(
plot_name,
has_labels = FALSE,
num_annotations = NeonTreeEvaluation::get_data(plot_name, "annotations") %>%
NeonTreeEvaluation::xml_parse() %>%
nrow()
)
}
# cat("done.\n")
return(res)
})
# List the plots that don't have a label attribute in the first place
lidar_label_data %>% filter(!has_labels)
#> # A tibble: 12 x 4
#> plot_name has_labels num_unique_labels num_annotations
#> <chr> <lgl> <int> <int>
#> 1 NIWO_001_2018 FALSE NA 176
#> 2 NIWO_002_2018 FALSE NA 292
#> 3 NIWO_004_2018 FALSE NA 107
#> 4 NIWO_005_2018 FALSE NA 146
#> 5 NIWO_010_2018 FALSE NA 148
#> 6 NIWO_012_2018 FALSE NA 136
#> 7 NIWO_014_2018 FALSE NA 179
#> 8 NIWO_015_2018 FALSE NA 150
#> 9 NIWO_016_2018 FALSE NA 134
#> 10 NIWO_017_2018 FALSE NA 151
#> 11 NIWO_042_2018 FALSE NA 5
#> 12 SJER_046_2018 FALSE NA 14
# List the plots where the number of annotations and the number of unique labels
# in the lidar data don't match
lidar_label_data %>%
filter(has_labels) %>%
mutate(num_unique_label_diff = num_unique_labels - num_annotations) %>%
filter(num_unique_label_diff != 0)
#> # A tibble: 4 x 5
#> plot_name has_labels num_unique_labe… num_annotations num_unique_label_di…
#> <chr> <lgl> <int> <int> <int>
#> 1 BLAN_005_2019 TRUE 33 34 -1
#> 2 TEAK_051_2018 TRUE 51 52 -1
#> 3 TEAK_055_2018 TRUE 18 19 -1
#> 4 TEAK_059_2018 TRUE 69 72 -3
Created on 2021-05-21 by the reprex package (v2.0.0)
Cheers,
Leon
I'll need to think about this more, but I believe this is intended. I'll add a note to the front of the readME, but not all data in the evaluation/lidar (or evaluation/rgb) have annotations. Many are unannotated, in case people want to do unsupervised learning, or annotate more. There are like 1200 plots, guessing an average of 60 trees, that's ~70,000 trees to annotate. There real question is where there are plots which are annotated in the RGB, but we haven't draped them into the LiDAR (meaning this script needs to be rerun https://github.com/weecology/NeonTreeEvaluation/blob/master/utilities/create_lidar_annotations.py). That is possible and worth checking to make sure it doesn't cause the mismatch in annotations number. More likely, the 2nd point is inevitable given the sparse density of the cloud, many trees which can be seen in the RGB have no points in the LiDAR, so nothing gets draped. I will rerun the create_lidar_annotations.py tomorrow and check the 2nd script, but I expect it not to change.
I understand that there are unannotated images and point clouds but I thought that once annotations are created for an RGB image, they are draped onto the point cloud as well? If this is correct, my analysis lists plots for which annotations exist but have not been draped onto the corresponding point clouds.
Of course, the point clouds which are missing just a few annotations might just miss them because there are no points at those annotations.
In any case, all of this is of course not that big of a problem, since the point cloud labels are not vital to any analysis (at least not any that I can think of) other than maybe visualization.