jennybc/googlesheets

Non-unique column header behaviour

jmcurran opened this issue · 9 comments

As of 2022-02-03, the googlesheets package has been archived on CRAN.
I'm closing all issues and pull requests prior to archiving this GitHub repo.

Learn more at:

https://github.com/jennybc/googlesheets#readme

I just created a Sheet with duplicate column names and here's what I see:

> ss <- gs_url("https://docs.google.com/spreadsheets/d/1Q9JLd6F4yKOZqbm22-5v9JZzzCEPfBQWSAgwn5inJqo/edit?usp=sharing")
Sheet-identifying info appears to be a browser URL.
googlesheets will attempt to extract sheet key from the URL.
Putative key: 1Q9JLd6F4yKOZqbm22-5v9JZzzCEPfBQWSAgwn5inJqo
Worksheets feed constructed with public visibility

> gs_read(ss)
Accessing worksheet titled 'Sheet1'.
Downloading: 34 B     Warning: Duplicated column names deduplicated: 'a' => 'a_1' [2]
Parsed with column specification:
cols(
  a = col_double(),
  a_1 = col_double()
)
# A tibble: 1 x 2
      a   a_1
  <dbl> <dbl>
1     1     2

The scolding for duplicate column names is emanating from somewhere other than googlesheets, namely readr. What version of readr do you have? I have v1.2.0 which, despite appearances, is a development version, i.e. it's ahead of CRAN.

In any case, if you're in a pickle, you can pass arguments through to readr. In this example, I tell readr to ignore the first row and create its own column names:

> gs_read(ss, col_names = FALSE, skip = 1)
Accessing worksheet titled 'Sheet1'.
Downloading: 34 B     Parsed with column specification:
cols(
  X1 = col_double(),
  X2 = col_double()
)
# A tibble: 1 x 2
     X1    X2
  <dbl> <dbl>
1     1     2

That would get you unstuck. But I think you really should figure out why you get an error about this, whereas I just get a warning.

Hi Jenny,

many apologies for the long delay in responding. I did update readr to 1.2.0 but this did not help. A little sleuthing led me to tracking the error's occurrence. If I specify the range of cells to be read, then it triggers the error. With your example, this is:

ss %>% gs_read(ws = "Sheet1", range = "A1:B2")
Accessing worksheet titled 'Sheet1'.
Downloading: 4 kB     Error: Column `a` must have a unique name

I can fix it by specifying the column names of course:

ss %>% gs_read(ws = "Sheet1", range = "A1:B2", skip = 1, col_names = c('a_1','a_2'))

but that rather defeats the purpose.

What version of dplyr do you have?

I think a general devtools::session_info() would be interesting here.

0.7.6

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2     googlesheets_0.3.0 magrittr_1.5       glue_1.3.0         readxl_1.1.0       forcats_0.3.0      stringr_1.3.1     
 [8] dplyr_0.7.6        purrr_0.2.5        readr_1.2.0        tidyr_0.8.1        tibble_1.4.2       ggplot2_3.0.0      tidyverse_1.2.1   
[15] here_0.1           RSQLite_2.1.1     

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.4 haven_1.1.2      lattice_0.20-35  colorspace_1.3-2 yaml_2.2.0       utf8_1.1.4       blob_1.1.1       rlang_0.2.1     
 [9] pillar_1.3.0     withr_2.1.2      DBI_1.0.0        bit64_0.9-7      modelr_0.1.2     bindr_0.1.1      plyr_1.8.4       munsell_0.5.0   
[17] gtable_0.2.0     cellranger_1.1.0 rvest_0.3.2      memoise_1.1.0    curl_3.2         fansi_0.2.3      broom_0.5.0      Rcpp_0.12.18    
[25] openssl_1.0.2    scales_0.5.0     backports_1.1.2  jsonlite_1.5     bit_1.1-14       hms_0.4.2        digest_0.6.15    stringi_1.2.4   
[33] grid_3.5.1       rprojroot_1.3-2  cli_1.0.0        tools_3.5.1      lazyeval_0.2.1   crayon_1.3.4     pkgconfig_2.0.1  xml2_1.2.0      
[41] lubridate_1.7.4  assertthat_0.2.0 rematch_1.0.1    httr_1.3.1       rstudioapi_0.7   R6_2.2.2         nlme_3.1-137     compiler_3.5.1 

Sure :)

> devtools::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 os       macOS High Sierra 10.13.6   
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_NZ.UTF-8                 
 ctype    en_NZ.UTF-8                 
 tz       Pacific/Auckland            
 date     2018-10-24                  

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package      * version date       lib source                          
 assertthat     0.2.0   2017-04-11 [1] CRAN (R 3.5.0)                  
 backports      1.1.2   2017-12-13 [1] CRAN (R 3.5.0)                  
 bindr          0.1.1   2018-03-13 [1] CRAN (R 3.5.0)                  
 bindrcpp     * 0.2.2   2018-03-29 [1] CRAN (R 3.5.0)                  
 bit            1.1-14  2018-05-29 [1] CRAN (R 3.5.0)                  
 bit64          0.9-7   2017-05-08 [1] CRAN (R 3.5.0)                  
 blob           1.1.1   2018-03-25 [1] CRAN (R 3.5.0)                  
 broom          0.5.0   2018-07-17 [1] CRAN (R 3.5.0)                  
 callr          2.0.4   2018-05-15 [1] CRAN (R 3.5.0)                  
 cellranger     1.1.0   2016-07-27 [1] CRAN (R 3.5.0)                  
 cli            1.0.0   2017-11-05 [1] CRAN (R 3.5.0)                  
 colorspace     1.3-2   2016-12-14 [1] CRAN (R 3.5.0)                  
 crayon         1.3.4   2017-09-16 [1] CRAN (R 3.5.0)                  
 curl           3.2     2018-03-28 [1] CRAN (R 3.5.0)                  
 DBI            1.0.0   2018-05-02 [1] CRAN (R 3.5.0)                  
 desc           1.2.0   2018-05-01 [1] CRAN (R 3.5.0)                  
 devtools       2.0.0   2018-10-19 [1] CRAN (R 3.5.1)                  
 digest         0.6.15  2018-01-28 [1] CRAN (R 3.5.0)                  
 dplyr        * 0.7.6   2018-06-29 [1] CRAN (R 3.5.1)                  
 fansi          0.2.3   2018-05-06 [1] CRAN (R 3.5.0)                  
 forcats      * 0.3.0   2018-02-19 [1] CRAN (R 3.5.0)                  
 fs             1.2.6   2018-08-23 [1] CRAN (R 3.5.0)                  
 ggplot2      * 3.0.0   2018-07-03 [1] CRAN (R 3.5.0)                  
 glue         * 1.3.0   2018-07-17 [1] CRAN (R 3.5.0)                  
 googlesheets * 0.3.0   2018-06-29 [1] CRAN (R 3.5.0)                  
 gtable         0.2.0   2016-02-26 [1] CRAN (R 3.5.0)                  
 haven          1.1.2   2018-06-27 [1] CRAN (R 3.5.0)                  
 here         * 0.1     2017-05-28 [1] CRAN (R 3.5.0)                  
 hms            0.4.2   2018-03-10 [1] CRAN (R 3.5.0)                  
 httr           1.3.1   2017-08-20 [1] CRAN (R 3.5.0)                  
 jsonlite       1.5     2017-06-01 [1] CRAN (R 3.5.0)                  
 lattice        0.20-35 2017-03-25 [1] CRAN (R 3.5.1)                  
 lazyeval       0.2.1   2017-10-29 [1] CRAN (R 3.5.0)                  
 lubridate      1.7.4   2018-04-11 [1] CRAN (R 3.5.0)                  
 magrittr     * 1.5     2014-11-22 [1] CRAN (R 3.5.0)                  
 memoise        1.1.0   2017-04-21 [1] CRAN (R 3.5.0)                  
 modelr         0.1.2   2018-05-11 [1] CRAN (R 3.5.0)                  
 munsell        0.5.0   2018-06-12 [1] CRAN (R 3.5.0)                  
 nlme           3.1-137 2018-04-07 [1] CRAN (R 3.5.1)                  
 openssl        1.0.2   2018-07-30 [1] CRAN (R 3.5.0)                  
 pillar         1.3.0   2018-07-14 [1] CRAN (R 3.5.0)                  
 pkgbuild       1.0.2   2018-10-16 [1] CRAN (R 3.5.0)                  
 pkgconfig      2.0.1   2017-03-21 [1] CRAN (R 3.5.0)                  
 pkgload        1.0.1   2018-10-11 [1] CRAN (R 3.5.0)                  
 plyr           1.8.4   2016-06-08 [1] CRAN (R 3.5.0)                  
 prettyunits    1.0.2   2015-07-13 [1] CRAN (R 3.5.0)                  
 processx       3.1.0   2018-05-15 [1] CRAN (R 3.5.0)                  
 purrr        * 0.2.5   2018-05-29 [1] CRAN (R 3.5.0)                  
 R6             2.2.2   2017-06-17 [1] CRAN (R 3.5.0)                  
 Rcpp           0.12.18 2018-07-23 [1] CRAN (R 3.5.0)                  
 readr        * 1.2.0   2018-10-24 [1] Github (tidyverse/readr@69c9fd3)
 readxl       * 1.1.0   2018-04-20 [1] CRAN (R 3.5.0)                  
 rematch        1.0.1   2016-04-21 [1] CRAN (R 3.5.0)                  
 remotes        2.0.1   2018-10-19 [1] CRAN (R 3.5.0)                  
 rlang          0.2.1   2018-05-30 [1] CRAN (R 3.5.0)                  
 rprojroot      1.3-2   2018-01-03 [1] CRAN (R 3.5.0)                  
 RSQLite      * 2.1.1   2018-05-06 [1] CRAN (R 3.5.0)                  
 rstudioapi     0.7     2017-09-07 [1] CRAN (R 3.5.0)                  
 rvest          0.3.2   2016-06-17 [1] CRAN (R 3.5.0)                  
 scales         0.5.0   2017-08-24 [1] CRAN (R 3.5.0)                  
 sessioninfo    1.1.0   2018-09-25 [1] CRAN (R 3.5.0)                  
 stringi        1.2.4   2018-07-20 [1] CRAN (R 3.5.0)                  
 stringr      * 1.3.1   2018-05-10 [1] CRAN (R 3.5.0)                  
 testthat       2.0.0   2017-12-13 [1] CRAN (R 3.5.0)                  
 tibble       * 1.4.2   2018-01-22 [1] CRAN (R 3.5.0)                  
 tidyr        * 0.8.1   2018-05-18 [1] CRAN (R 3.5.0)                  
 tidyselect     0.2.4   2018-02-26 [1] CRAN (R 3.5.0)                  
 tidyverse    * 1.2.1   2017-11-14 [1] CRAN (R 3.5.0)                  
 usethis        1.4.0   2018-08-14 [1] CRAN (R 3.5.0)                  
 utf8           1.1.4   2018-05-24 [1] CRAN (R 3.5.0)                  
 withr          2.1.2   2018-03-15 [1] CRAN (R 3.5.0)                  
 xml2           1.2.0   2018-01-24 [1] CRAN (R 3.5.0)                  
 yaml           2.2.0   2018-07-25 [1] CRAN (R 3.5.0)                  

[1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library

This message actually appears to be coming from dplyr.

https://github.com/tidyverse/dplyr/blob/76f1d112b3e4445bb5aca707c744f64876a15bac/src/utils.cpp#L36

It's not coming from googlesheets.

I note that you are on dplyr v0.7.6, whereas current CRAN version is v0.7.7, which is also what I have. And yet I don't know of any specific change that would affect what we are seeing. But updating to current CRAN version of dplyr can't hurt.

You could also post the output of traceback() here.

No prob - this is from my actually application so probably around 10. upwards is the relevant part

 Error: Columns `Staff`, `Weight`, `Staff`, `Weight` must have unique names 
36.
stop(cnd) 
35.
abort(paste0(...)) 
34.
stopc(pluralise_msg("Column(s) ", vars), " ", pluralise(problem, 
    vars)) 
33.
invalid_df("must have [a] unique name(s)", x, dups) 
32.
check_tibble(x) 
31.
list_to_tibble(x, validate) 
30.
as_data_frame.data.frame(.) 
29.
dplyr::as_data_frame(.) 
28.
function_list[[k]](value) 
27.
withVisible(function_list[[k]](value)) 
26.
freduce(value, `_function_list`) 
25.
`_fseq`(`_lhs`) 
24.
eval(quote(`_fseq`(`_lhs`)), env, env) 
23.
eval(quote(`_fseq`(`_lhs`)), env, env) 
22.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
21.
dat %>% as.data.frame(stringsAsFactors = FALSE) %>% dplyr::as_data_frame() 
20.
gs_reshape_feed(x, ddd, verbose) 
19.
gs_reshape_cellfeed(., literal = literal, ..., verbose = verbose) 
18.
function_list[[k]](value) 
17.
withVisible(function_list[[k]](value)) 
16.
freduce(value, `_function_list`) 
15.
`_fseq`(`_lhs`) 
14.
eval(quote(`_fseq`(`_lhs`)), env, env) 
13.
eval(quote(`_fseq`(`_lhs`)), env, env) 
12.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
11.
gs_read_cellfeed(ss, ws = ws, range = range, ..., verbose = verbose) %>% 
    gs_reshape_cellfeed(literal = literal, ..., verbose = verbose) 
10.
gs_read(., ws = "Semester_1", range = "A1:J36", col_names = TRUE) 
9.
function_list[[k]](value) 
8.
withVisible(function_list[[k]](value)) 
7.
freduce(value, `_function_list`) 
6.
`_fseq`(`_lhs`) 
5.
eval(quote(`_fseq`(`_lhs`)), env, env) 
4.
eval(quote(`_fseq`(`_lhs`)), env, env) 
3.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
2.
mysheet %>% gs_read(ws = "Semester_1", range = "A1:J36", col_names = TRUE) 
1.
readData()