NewGraphEnvironment/fish_passage_skeena_2022_reporting

`date_time_start` changing when importing `form_pscis_2023.gpkg`

lucy-schick opened this issue · 1 comments

Issue:
The date:time is changing in the date_time_start column when reading in the form_pscis_2023.gpkg from Q. Using site 8478 as an example, in Q date_time_start = 2023-09-19 13:30:02 (UTC) but when you read it in it changes to
2023-09-19 06:30:02 , it basically added 7hr to the time. The issue seems to be occurring when reading in the gpkg. I can burn the gpkg to Q and the dates stay the same, but when reading it back in the times change. I am aware that in Q the timezone is set to UTC, which is not correct, it should be PDT (as seen in the 2022 skeena data).

How to reproduce issue:

  1. read in the backed up form_pscis_2023.csv from data/backup/ which has the correct date:time
  2. read in the form_pscis_2023.gpkg
  3. compare date:times. The times in form_pscis_2023.gpkg have +7hrs added to them.

What I've tried:

  • Setting the timezone to "America/Vancouver" in the lubridate call. This changed the date:time incorrectly when I burn to Q. dplyr::mutate(date_time_start = lubridate::ymd_hms(date_time_start, tz = "America/Vancouver")

Reprex:

library(tidyverse)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(lubridate)

dir_project <- 'sern_skeena_2023'

form_pscis <- sf::st_read(dsn= paste0('~/Projects/gis/', dir_project, '/data_field/2023/form_pscis_2023.gpkg'))
#> Reading layer `form_pscis_2023' from data source 
#>   `/Users/lucyschick/Projects/gis/sern_skeena_2023/data_field/2023/form_pscis_2023.gpkg' 
#>   using driver `GPKG'
#> Simple feature collection with 59 features and 98 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 842859 ymin: 1008863 xmax: 958901.9 ymax: 1169499
#> Projected CRS: NAD83 / BC Albers

form_redo <- read_csv("~/Projects/repo/fish_passage_skeena_2023_reporting/data/backup/form_pscis_2023.csv") %>%
  arrange(site_id)
#> Rows: 59 Columns: 98
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (39): crew_members, my_priority, assessment_comment, condition_notes, c...
#> dbl  (39): site_id, moti_chris_culvert_id, moti_chris_culvert_id2, pscis_cro...
#> lgl  (18): moti_chris_culvert_id3, my_citation_key1, my_citation_key2, my_ci...
#> dttm  (1): date_time_start
#> date  (1): date
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#join correct date_time_start from the backup csv to form_pscis
redo <- form_pscis %>%
  dplyr::select(-date_time_start) %>% # Exclude the old column
  left_join(form_redo %>% dplyr::select(site_id, date_time_start), by = 'site_id') %>%
  relocate(date_time_start, .after = crew_members)

# this is the same code from pscis_tidy.R. I am running this because we edit the date_time_start column so this could be where the issue is coming from, but I don't think so.
redo <- redo %>%
  #split date time column into date and time
  dplyr::mutate(date_time_start = lubridate::ymd_hms(date_time_start),
                date = lubridate::date(date_time_start),
                time = hms::as_hms(date_time_start)) %>%
  # filter out to get only the records newly created
  filter(!is.na(date_time_start)) %>%
  mutate(
    site_id = case_when(is.na(pscis_crossing_id) ~ my_crossing_reference,
                        T ~ pscis_crossing_id)
  ) %>%
  # remove the form making site
  filter(site_id != '12345') %>%
  arrange(site_id)

#testing time zone
tz(redo$date_time_start)
#> [1] "UTC"


# clean up data fields to make copy and paste to prov template easier
redo <- redo %>%
  # some columns that have yes/no answers have NA values in mergin, need to change to No
  # need to add 'No' as default values to mergin
  mutate(across(contains('yes_no'), ~replace_na(.,'No'))) %>%
  # some numeric fields for CBS have NA values when a user input 0
  mutate(across(c(outlet_drop_meters, outlet_pool_depth_0_01m, culvert_slope_percent, stream_slope),
                ~case_when(crossing_type == 'Closed Bottom Structure' ~replace_na(.,0),
                           TRUE ~ .
                ))) %>%
  # change 'trib' to long version 'Tributary'
  mutate(stream_name = str_replace_all(stream_name, 'Trib ', 'Tributary ')) %>%
  # change 'Hwy' to 'Highway'
  mutate(road_name = str_replace_all(road_name, 'Hwy ', 'Highway '))

# add in which phase of assessment the site is in, did the reassessment sites by hand (because there was only 3-4) so thats not in here.
# This only works when only phase 1 sites have a priority ranking, could use some updating but works for now.
redo <- redo %>%
  mutate(source = case_when(
    my_priority == 'phase 2' ~ 'phase2',
    my_priority == 'high' ~ 'phase1',
    my_priority == 'medium' ~ 'phase1',
    my_priority == 'low' ~ 'phase1',
    is.na(my_priority) ~ 'phase1',
    T ~ source))

# burn cleaned copy to QGIS project gpkg
redo %>%
  sf::st_write(paste0('~/Projects/gis/', dir_project, '/data_field/2023/form_pscis_2023.gpkg'), append=F, delete_dsn=T)
#> Warning in clean_columns(as.data.frame(obj), factorsAsCharacter): Dropping
#> column(s) time of class(es) hms;difftime
#> Deleting source `/Users/lucyschick/Projects/gis/sern_skeena_2023/data_field/2023/form_pscis_2023.gpkg' using driver `GPKG'
#> Writing layer `form_pscis_2023' to data source 
#>   `/Users/lucyschick/Projects/gis/sern_skeena_2023/data_field/2023/form_pscis_2023.gpkg' using driver `GPKG'
#> Writing 59 features with 98 fields and geometry type Point.

Created on 2024-03-25 with reprex v2.1.0

closing in favour of NewGraphEnvironment/fish_passage_skeena_2023_reporting#49. opened in wrong repo