Unexpected `NA`s in longitude and latitude using `reformat_GSOD`
meixilin opened this issue · 4 comments
Hi,
thanks for making this package available. I was trying to use the reformat_GSOD function but noticed that some latitude and longitude were converted to NA unexpectedly.
Session Info
devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.2 (2019-12-12)
os CentOS Linux 7 (Core)
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Los_Angeles
date 2022-11-20
─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
cachem 1.0.6 2021-08-19 [2] CRAN (R 3.6.2)
callr 3.7.0 2021-04-20 [2] CRAN (R 3.6.2)
class 7.3-16 2020-03-25 [2] CRAN (R 3.6.2)
classInt 0.4-3 2020-04-07 [2] CRAN (R 3.6.2)
cli 3.1.0 2021-10-27 [2] CRAN (R 3.6.2)
crayon 1.3.4 2017-09-16 [2] CRAN (R 3.6.2)
data.table 1.14.2 2021-09-27 [2] CRAN (R 3.6.2)
DBI 1.1.1 2021-01-15 [2] CRAN (R 3.6.2)
desc 1.4.0 2021-09-28 [2] CRAN (R 3.6.2)
devtools 2.2.2 2020-02-17 [2] CRAN (R 3.6.2)
dplyr * 1.0.7 2021-06-18 [2] CRAN (R 3.6.2)
e1071 1.7-3 2019-11-26 [2] CRAN (R 3.6.2)
ellipsis 0.3.2 2021-04-29 [2] CRAN (R 3.6.2)
fansi 0.4.1 2020-01-08 [2] CRAN (R 3.6.2)
fastmap 1.1.0 2021-01-25 [2] CRAN (R 3.6.2)
fs 1.5.2 2021-12-08 [2] CRAN (R 3.6.2)
generics 0.1.1 2021-10-25 [2] CRAN (R 3.6.2)
glue 1.5.1 2021-11-30 [2] CRAN (R 3.6.2)
GSODR * 3.1.6 2022-08-13 [1] CRAN (R 3.6.2)
KernSmooth 2.23-16 2019-10-15 [2] CRAN (R 3.6.2)
lifecycle 1.0.1 2021-09-24 [2] CRAN (R 3.6.2)
magrittr 2.0.1 2020-11-17 [2] CRAN (R 3.6.2)
memoise 2.0.1 2021-11-26 [2] CRAN (R 3.6.2)
pillar 1.6.4 2021-10-18 [2] CRAN (R 3.6.2)
pkgbuild 1.0.6 2019-10-09 [2] CRAN (R 3.6.2)
pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 3.6.2)
pkgload 1.2.4 2021-11-30 [2] CRAN (R 3.6.2)
prettyunits 1.1.1 2020-01-24 [2] CRAN (R 3.6.2)
processx 3.5.2 2021-04-30 [2] CRAN (R 3.6.2)
ps 1.6.0 2021-02-28 [2] CRAN (R 3.6.2)
purrr 0.3.4 2020-04-17 [2] CRAN (R 3.6.2)
R6 2.4.1 2019-11-12 [2] CRAN (R 3.6.2)
Rcpp 1.0.7 2021-07-07 [2] CRAN (R 3.6.2)
remotes 2.1.1 2020-02-15 [2] CRAN (R 3.6.2)
rlang 0.4.12 2021-10-18 [2] CRAN (R 3.6.2)
rprojroot 2.0.2 2020-11-15 [2] CRAN (R 3.6.2)
rstudioapi 0.13 2020-11-12 [2] CRAN (R 3.6.2)
sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 3.6.2)
sf 1.0-4 2021-11-14 [2] CRAN (R 3.6.2)
stringi 1.4.6 2020-02-17 [2] CRAN (R 3.6.2)
stringr * 1.4.0 2019-02-10 [2] CRAN (R 3.6.2)
testthat 3.1.1 2021-12-03 [2] CRAN (R 3.6.2)
tibble 3.1.6 2021-11-07 [2] CRAN (R 3.6.2)
tidyr 1.1.4 2021-09-27 [2] CRAN (R 3.6.2)
tidyselect 1.1.1 2021-04-30 [2] CRAN (R 3.6.2)
units 0.6-6 2020-03-16 [2] CRAN (R 3.6.2)
usethis 2.1.5 2021-12-09 [2] CRAN (R 3.6.2)
utf8 1.1.4 2018-05-24 [2] CRAN (R 3.6.2)
vctrs 0.3.8 2021-04-29 [2] CRAN (R 3.6.2)
withr 2.4.3 2021-11-30 [2] CRAN (R 3.6.2)
To reproduce this problem:
wget https://www.ncei.noaa.gov/data/global-summary-of-the-day/access/2017/72057600174.csv
head -2 72057600174.csv
"STATION","DATE","LATITUDE","LONGITUDE","ELEVATION","NAME","TEMP","TEMP_ATTRIBUTES","DEWP","DEWP_ATTRIBUTES","SLP","SLP_ATTRIBUTES","STP","STP_ATTRIBUTES","VISIB","VISIB_ATTRIBUTES","WDSP","WDSP_ATTRIBUTES","MXSPD","GUST","MAX","MAX_ATTRIBUTES","MIN","MIN_ATTRIBUTES","PRCP","PRCP_ATTRIBUTES","SNDP","FRSHTT"
"72057600174","2017-01-01","38.533","-121.783","21.0","UNIVERSITY AIRPORT, CA US"," 42.9","24"," 39.2","24","9999.9"," 0","011.7","16"," 9.8","24"," 5.8","24"," 12.0"," 15.0"," 51.8","*"," 39.2","*"," 0.00","I","999.9","000000"
dt = GSODR::reformat_GSOD(file_list = '72057600174.csv')
head(dt)
STNID NAME CTRY COUNTRY_NAME ISO2C ISO3C STATE LATITUDE LONGITUDE ELEVATION BEGIN END YEARMODA YEAR MONTH DAY YDAY TEMP
1: 720576-00174 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 21 NA NA 2017-01-01 2017 1 1 1 6.1
2: 720576-00174 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 21 NA NA 2017-01-02 2017 1 2 2 6.7
3: 720576-00174 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 21 NA NA 2017-01-03 2017 1 3 3 7.5
4: 720576-00174 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 21 NA NA 2017-01-04 2017 1 4 4 9.8
5: 720576-00174 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 21 NA NA 2017-01-05 2017 1 5 5 7.3
6: 720576-00174 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 21 NA NA 2017-01-06 2017 1 6 6 3.7
I dug around and I think the problem is at the isd_history
file called during reformat_GSOD
, which does not contain the station id 72057600174
but contained another id at the same location 720576-99999
.
I am a bit confused about why this might be happening. Any input would be greatly appreciated.
Best
Hi, sorry about this issue. It is indeed unexpected behaviour. Unfortunately, right now I’m on vacation without a laptop to investigate. I’ll get back to this as soon as I’m able to next month sometime.
Is it the behaviour the same when using the get_gsod() for the same data set?
No worries! I think I ended up rewriting the reformat_GSOD
a little bit without querying the isd_history
and that fixed my problems. I haven't tried the get_gsod
yet.
Enjoy your holidays!
Thank you for reporting this. It was indeed a bug that went deeper than I expected. I've fixed everything up in the devel branch now and will submit a new release to CRAN in 2023.
Thanks, this has been fixed in the latest version available from CRAN now, v3.1.7