ropensci/GSODR

Unexpected `NA`s in longitude and latitude using `reformat_GSOD`

meixilin opened this issue · 4 comments

Hi,

thanks for making this package available. I was trying to use the reformat_GSOD function but noticed that some latitude and longitude were converted to NA unexpectedly.

Session Info
devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       CentOS Linux 7 (Core)       
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2022-11-20Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source        
 cachem        1.0.6   2021-08-19 [2] CRAN (R 3.6.2)
 callr         3.7.0   2021-04-20 [2] CRAN (R 3.6.2)
 class         7.3-16  2020-03-25 [2] CRAN (R 3.6.2)
 classInt      0.4-3   2020-04-07 [2] CRAN (R 3.6.2)
 cli           3.1.0   2021-10-27 [2] CRAN (R 3.6.2)
 crayon        1.3.4   2017-09-16 [2] CRAN (R 3.6.2)
 data.table    1.14.2  2021-09-27 [2] CRAN (R 3.6.2)
 DBI           1.1.1   2021-01-15 [2] CRAN (R 3.6.2)
 desc          1.4.0   2021-09-28 [2] CRAN (R 3.6.2)
 devtools      2.2.2   2020-02-17 [2] CRAN (R 3.6.2)
 dplyr       * 1.0.7   2021-06-18 [2] CRAN (R 3.6.2)
 e1071         1.7-3   2019-11-26 [2] CRAN (R 3.6.2)
 ellipsis      0.3.2   2021-04-29 [2] CRAN (R 3.6.2)
 fansi         0.4.1   2020-01-08 [2] CRAN (R 3.6.2)
 fastmap       1.1.0   2021-01-25 [2] CRAN (R 3.6.2)
 fs            1.5.2   2021-12-08 [2] CRAN (R 3.6.2)
 generics      0.1.1   2021-10-25 [2] CRAN (R 3.6.2)
 glue          1.5.1   2021-11-30 [2] CRAN (R 3.6.2)
 GSODR       * 3.1.6   2022-08-13 [1] CRAN (R 3.6.2)
 KernSmooth    2.23-16 2019-10-15 [2] CRAN (R 3.6.2)
 lifecycle     1.0.1   2021-09-24 [2] CRAN (R 3.6.2)
 magrittr      2.0.1   2020-11-17 [2] CRAN (R 3.6.2)
 memoise       2.0.1   2021-11-26 [2] CRAN (R 3.6.2)
 pillar        1.6.4   2021-10-18 [2] CRAN (R 3.6.2)
 pkgbuild      1.0.6   2019-10-09 [2] CRAN (R 3.6.2)
 pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 3.6.2)
 pkgload       1.2.4   2021-11-30 [2] CRAN (R 3.6.2)
 prettyunits   1.1.1   2020-01-24 [2] CRAN (R 3.6.2)
 processx      3.5.2   2021-04-30 [2] CRAN (R 3.6.2)
 ps            1.6.0   2021-02-28 [2] CRAN (R 3.6.2)
 purrr         0.3.4   2020-04-17 [2] CRAN (R 3.6.2)
 R6            2.4.1   2019-11-12 [2] CRAN (R 3.6.2)
 Rcpp          1.0.7   2021-07-07 [2] CRAN (R 3.6.2)
 remotes       2.1.1   2020-02-15 [2] CRAN (R 3.6.2)
 rlang         0.4.12  2021-10-18 [2] CRAN (R 3.6.2)
 rprojroot     2.0.2   2020-11-15 [2] CRAN (R 3.6.2)
 rstudioapi    0.13    2020-11-12 [2] CRAN (R 3.6.2)
 sessioninfo   1.1.1   2018-11-05 [2] CRAN (R 3.6.2)
 sf            1.0-4   2021-11-14 [2] CRAN (R 3.6.2)
 stringi       1.4.6   2020-02-17 [2] CRAN (R 3.6.2)
 stringr     * 1.4.0   2019-02-10 [2] CRAN (R 3.6.2)
 testthat      3.1.1   2021-12-03 [2] CRAN (R 3.6.2)
 tibble        3.1.6   2021-11-07 [2] CRAN (R 3.6.2)
 tidyr         1.1.4   2021-09-27 [2] CRAN (R 3.6.2)
 tidyselect    1.1.1   2021-04-30 [2] CRAN (R 3.6.2)
 units         0.6-6   2020-03-16 [2] CRAN (R 3.6.2)
 usethis       2.1.5   2021-12-09 [2] CRAN (R 3.6.2)
 utf8          1.1.4   2018-05-24 [2] CRAN (R 3.6.2)
 vctrs         0.3.8   2021-04-29 [2] CRAN (R 3.6.2)
 withr         2.4.3   2021-11-30 [2] CRAN (R 3.6.2)

To reproduce this problem:

wget https://www.ncei.noaa.gov/data/global-summary-of-the-day/access/2017/72057600174.csv
head -2 72057600174.csv
"STATION","DATE","LATITUDE","LONGITUDE","ELEVATION","NAME","TEMP","TEMP_ATTRIBUTES","DEWP","DEWP_ATTRIBUTES","SLP","SLP_ATTRIBUTES","STP","STP_ATTRIBUTES","VISIB","VISIB_ATTRIBUTES","WDSP","WDSP_ATTRIBUTES","MXSPD","GUST","MAX","MAX_ATTRIBUTES","MIN","MIN_ATTRIBUTES","PRCP","PRCP_ATTRIBUTES","SNDP","FRSHTT"
"72057600174","2017-01-01","38.533","-121.783","21.0","UNIVERSITY AIRPORT, CA US","  42.9","24","  39.2","24","9999.9"," 0","011.7","16","  9.8","24","  5.8","24"," 12.0"," 15.0","  51.8","*","  39.2","*"," 0.00","I","999.9","000000"
dt = GSODR::reformat_GSOD(file_list = '72057600174.csv')
head(dt)
          STNID NAME CTRY COUNTRY_NAME ISO2C ISO3C STATE LATITUDE LONGITUDE ELEVATION BEGIN END   YEARMODA YEAR MONTH DAY YDAY TEMP
1: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-01 2017     1   1    1  6.1
2: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-02 2017     1   2    2  6.7
3: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-03 2017     1   3    3  7.5
4: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-04 2017     1   4    4  9.8
5: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-05 2017     1   5    5  7.3
6: 720576-00174 <NA> <NA>         <NA>  <NA>  <NA>  <NA>       NA        NA        21    NA  NA 2017-01-06 2017     1   6    6  3.7

I dug around and I think the problem is at the isd_history file called during reformat_GSOD, which does not contain the station id 72057600174 but contained another id at the same location 720576-99999.

I am a bit confused about why this might be happening. Any input would be greatly appreciated.

Best

Hi, sorry about this issue. It is indeed unexpected behaviour. Unfortunately, right now I’m on vacation without a laptop to investigate. I’ll get back to this as soon as I’m able to next month sometime.

Is it the behaviour the same when using the get_gsod() for the same data set?

No worries! I think I ended up rewriting the reformat_GSOD a little bit without querying the isd_history and that fixed my problems. I haven't tried the get_gsod yet.

Enjoy your holidays!

Thank you for reporting this. It was indeed a bug that went deeper than I expected. I've fixed everything up in the devel branch now and will submit a new release to CRAN in 2023.

Thanks, this has been fixed in the latest version available from CRAN now, v3.1.7