`date_time_start` changing when importing `form_pscis_2023.gpkg`
lucy-schick opened this issue · 1 comments
Issue:
The date:time is changing in the date_time_start
column when reading in the form_pscis_2023.gpkg
from Q. Using site 8478
as an example, in Q date_time_start = 2023-09-19 13:30:02 (UTC)
but when you read it in it changes to
2023-09-19 06:30:02
, it basically added 7hr to the time. The issue seems to be occurring when reading in the gpkg. I can burn the gpkg to Q and the dates stay the same, but when reading it back in the times change. I am aware that in Q the timezone is set to UTC, which is not correct, it should be PDT (as seen in the 2022 skeena data).
How to reproduce issue:
- read in the backed up
form_pscis_2023.csv
fromdata/backup/
which has the correct date:time - read in the
form_pscis_2023.gpkg
- compare date:times. The times in
form_pscis_2023.gpkg
have +7hrs added to them.
What I've tried:
- Setting the timezone to "America/Vancouver" in the
lubridate
call. This changed the date:time incorrectly when I burn to Q.dplyr::mutate(date_time_start = lubridate::ymd_hms(date_time_start, tz = "America/Vancouver")
Reprex:
library(tidyverse)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(lubridate)
dir_project <- 'sern_skeena_2023'
form_pscis <- sf::st_read(dsn= paste0('~/Projects/gis/', dir_project, '/data_field/2023/form_pscis_2023.gpkg'))
#> Reading layer `form_pscis_2023' from data source
#> `/Users/lucyschick/Projects/gis/sern_skeena_2023/data_field/2023/form_pscis_2023.gpkg'
#> using driver `GPKG'
#> Simple feature collection with 59 features and 98 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 842859 ymin: 1008863 xmax: 958901.9 ymax: 1169499
#> Projected CRS: NAD83 / BC Albers
form_redo <- read_csv("~/Projects/repo/fish_passage_skeena_2023_reporting/data/backup/form_pscis_2023.csv") %>%
arrange(site_id)
#> Rows: 59 Columns: 98
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (39): crew_members, my_priority, assessment_comment, condition_notes, c...
#> dbl (39): site_id, moti_chris_culvert_id, moti_chris_culvert_id2, pscis_cro...
#> lgl (18): moti_chris_culvert_id3, my_citation_key1, my_citation_key2, my_ci...
#> dttm (1): date_time_start
#> date (1): date
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#join correct date_time_start from the backup csv to form_pscis
redo <- form_pscis %>%
dplyr::select(-date_time_start) %>% # Exclude the old column
left_join(form_redo %>% dplyr::select(site_id, date_time_start), by = 'site_id') %>%
relocate(date_time_start, .after = crew_members)
# this is the same code from pscis_tidy.R. I am running this because we edit the date_time_start column so this could be where the issue is coming from, but I don't think so.
redo <- redo %>%
#split date time column into date and time
dplyr::mutate(date_time_start = lubridate::ymd_hms(date_time_start),
date = lubridate::date(date_time_start),
time = hms::as_hms(date_time_start)) %>%
# filter out to get only the records newly created
filter(!is.na(date_time_start)) %>%
mutate(
site_id = case_when(is.na(pscis_crossing_id) ~ my_crossing_reference,
T ~ pscis_crossing_id)
) %>%
# remove the form making site
filter(site_id != '12345') %>%
arrange(site_id)
#testing time zone
tz(redo$date_time_start)
#> [1] "UTC"
# clean up data fields to make copy and paste to prov template easier
redo <- redo %>%
# some columns that have yes/no answers have NA values in mergin, need to change to No
# need to add 'No' as default values to mergin
mutate(across(contains('yes_no'), ~replace_na(.,'No'))) %>%
# some numeric fields for CBS have NA values when a user input 0
mutate(across(c(outlet_drop_meters, outlet_pool_depth_0_01m, culvert_slope_percent, stream_slope),
~case_when(crossing_type == 'Closed Bottom Structure' ~replace_na(.,0),
TRUE ~ .
))) %>%
# change 'trib' to long version 'Tributary'
mutate(stream_name = str_replace_all(stream_name, 'Trib ', 'Tributary ')) %>%
# change 'Hwy' to 'Highway'
mutate(road_name = str_replace_all(road_name, 'Hwy ', 'Highway '))
# add in which phase of assessment the site is in, did the reassessment sites by hand (because there was only 3-4) so thats not in here.
# This only works when only phase 1 sites have a priority ranking, could use some updating but works for now.
redo <- redo %>%
mutate(source = case_when(
my_priority == 'phase 2' ~ 'phase2',
my_priority == 'high' ~ 'phase1',
my_priority == 'medium' ~ 'phase1',
my_priority == 'low' ~ 'phase1',
is.na(my_priority) ~ 'phase1',
T ~ source))
# burn cleaned copy to QGIS project gpkg
redo %>%
sf::st_write(paste0('~/Projects/gis/', dir_project, '/data_field/2023/form_pscis_2023.gpkg'), append=F, delete_dsn=T)
#> Warning in clean_columns(as.data.frame(obj), factorsAsCharacter): Dropping
#> column(s) time of class(es) hms;difftime
#> Deleting source `/Users/lucyschick/Projects/gis/sern_skeena_2023/data_field/2023/form_pscis_2023.gpkg' using driver `GPKG'
#> Writing layer `form_pscis_2023' to data source
#> `/Users/lucyschick/Projects/gis/sern_skeena_2023/data_field/2023/form_pscis_2023.gpkg' using driver `GPKG'
#> Writing 59 features with 98 fields and geometry type Point.
Created on 2024-03-25 with reprex v2.1.0
closing in favour of NewGraphEnvironment/fish_passage_skeena_2023_reporting#49. opened in wrong repo