tidyverse/readxl

zip path is too long

brianmsm opened this issue · 5 comments

I am in windows and I have a certain folder structure. I have a database that I try to import with readxl::read_excel(), however I get the following error:

Error in unz(zip_path, file_path, open = "rb") : 
  cannot open the connection
In addition: Warning message:
In unz(zip_path, file_path, open = "rb") : zip path is too long

I have copied the same file to the same location in .sav and .dta format with the haven package and it reads normally. I have also activated long paths as suggested here (https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=powershell), but it still does not work.

haven::read_sav("1. Data/Valence Depresion Domaradzka.sav")
#> # A tibble: 1,632 × 39
#>       Id sex         age VD02    VD03    VD04    VD05    VD06    VD07    VD08   
#>    <dbl> <dbl+lbl> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l>
#>  1     2 1 [Femal…    32 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  2     4 1 [Femal…    34 2 [I d… 1 [I a… 1 [I a… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  3    10 1 [Femal…    30 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  4    11 1 [Femal…    23 1 [I a… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  5    15 1 [Femal…    53 2 [I d… 1 [I a… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  6    16 1 [Femal…    46 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  7    17 1 [Femal…    51 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d… 2 [I d… 2 [I d…
#>  8    19 1 [Femal…    62 1 [I a… 1 [I a… 2 [I d… 1 [I a… 1 [I a… 2 [I d… 1 [I a…
#>  9    22 1 [Femal…    34 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#> 10    24 1 [Femal…    43 2 [I d… 1 [I a… 2 [I d… 1 [I a… 1 [I a… 1 [I a… 1 [I a…
#> # … with 1,622 more rows, and 29 more variables: VD09 <dbl+lbl>,
#> #   VD10 <dbl+lbl>, VD11 <dbl+lbl>, VD12 <dbl+lbl>, VD14 <dbl+lbl>,
#> #   VD15 <dbl+lbl>, VD16 <dbl+lbl>, VD17 <dbl+lbl>, VD18 <dbl+lbl>,
#> #   VD19 <dbl+lbl>, VD20 <dbl+lbl>, VD21 <dbl+lbl>, VD22 <dbl+lbl>,
#> #   VD23 <dbl+lbl>, VD24 <dbl+lbl>, VD25 <dbl+lbl>, VD26 <dbl+lbl>,
#> #   VD27 <dbl+lbl>, VD28 <dbl+lbl>, VD29 <dbl+lbl>, VD30 <dbl+lbl>,
#> #   VD31 <dbl+lbl>, VD33 <dbl+lbl>, VD34 <dbl+lbl>, VD35 <dbl+lbl>, …
haven::read_dta("1. Data/Valence depresion Domaradzka.dta")
#> # A tibble: 1,632 × 39
#>       Id sex         age VD02    VD03    VD04    VD05    VD06    VD07    VD08   
#>    <dbl> <dbl+lbl> <dbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l>
#>  1     1 2 [Male]     31 1 [I a… 2 [I d… 2 [I d… 1 [I a… 1 [I a… 1 [I a… 1 [I a…
#>  2     2 1 [Femal…    32 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  3     3 2 [Male]     40 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  4     4 1 [Femal…    34 2 [I d… 1 [I a… 1 [I a… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#>  5     5 2 [Male]     40 2 [I d… 2 [I d… 1 [I a… 2 [I d… 1 [I a… 2 [I d… 2 [I d…
#>  6     6 2 [Male]     24 2 [I d… 1 [I a… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  7     7 2 [Male]     29 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d…
#>  8     8 2 [Male]     25 1 [I a… 1 [I a… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 1 [I a…
#>  9     9 2 [Male]     25 1 [I a… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d… 1 [I a…
#> 10    10 1 [Femal…    30 2 [I d… 2 [I d… 2 [I d… 2 [I d… 2 [I d… 1 [I a… 2 [I d…
#> # … with 1,622 more rows, and 29 more variables: VD09 <dbl+lbl>,
#> #   VD10 <dbl+lbl>, VD11 <dbl+lbl>, VD12 <dbl+lbl>, VD14 <dbl+lbl>,
#> #   VD15 <dbl+lbl>, VD16 <dbl+lbl>, VD17 <dbl+lbl>, VD18 <dbl+lbl>,
#> #   VD19 <dbl+lbl>, VD20 <dbl+lbl>, VD21 <dbl+lbl>, VD22 <dbl+lbl>,
#> #   VD23 <dbl+lbl>, VD24 <dbl+lbl>, VD25 <dbl+lbl>, VD26 <dbl+lbl>,
#> #   VD27 <dbl+lbl>, VD28 <dbl+lbl>, VD29 <dbl+lbl>, VD30 <dbl+lbl>,
#> #   VD31 <dbl+lbl>, VD33 <dbl+lbl>, VD34 <dbl+lbl>, VD35 <dbl+lbl>, …
readxl::read_excel("1. Data/Valence depresion Domaradzka.xlsx")
#> Warning in unz(zip_path, file_path, open = "rb"): el path de zip es demasiado
#> largo
#> Error in unz(zip_path, file_path, open = "rb"): no se puede abrir la conexión

fs::path_real("1. Data/Valence depresion Domaradzka.xlsx")
#> D:/Insync/brianmsm@gmail.com/Google Drive/Cursos de Brian Peña - Compartido/Mios/Cursos en la SPP/1. Curso Virtual. Análisis de datos con R para Psicólogos/Materiales/Cuarta Edición/Sesión 01/1. Data/Valence depresion Domaradzka.xlsx

Created on 2023-02-05 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31 ucrt)
#>  os       Windows 10 x64 (build 22621)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  Spanish_Peru.utf8
#>  ctype    Spanish_Peru.utf8
#>  tz       America/Bogota
#>  date     2023-02-05
#>  pandoc   3.0.1 @ C:/Users/brian/AppData/Local/Pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.2.2)
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
#>  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.2.2)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.2)
#>  evaluate      0.20    2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.2)
#>  forcats       1.0.0   2023-01-29 [1] CRAN (R 4.2.2)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
#>  haven         2.5.1   2022-08-22 [1] CRAN (R 4.2.2)
#>  hms           1.1.2   2022-08-19 [1] CRAN (R 4.2.2)
#>  htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
#>  knitr         1.42    2023-01-25 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
#>  pillar        1.8.1   2022-08-19 [1] CRAN (R 4.2.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.2)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.2.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
#>  readr         2.1.3   2022-10-01 [1] CRAN (R 4.2.2)
#>  readxl        1.4.1   2022-08-17 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang         1.0.6   2022-09-24 [1] CRAN (R 4.2.2)
#>  rmarkdown     2.20    2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
#>  styler        1.9.0   2023-01-15 [1] CRAN (R 4.2.2)
#>  tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.2)
#>  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.2)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.2)
#>  vctrs         0.5.1   2022-11-16 [1] CRAN (R 4.2.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
#>  xfun          0.36    2022-12-21 [1] CRAN (R 4.2.2)
#>  yaml          2.3.6   2022-10-18 [1] CRAN (R 4.2.2)
#> 
#>  [1] C:/Users/brian/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.2/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

readxl only uses base R facilities in the internal helper where this is coming from:

https://github.com/tidyverse/readxl/blob/main/R/xlsx-zip.R

So the answer for now is that this path truly is problematic for readxl, because there's not some quick fix we can make in our code.

I know you say have activated long paths, but here's someone reporting success with that method, pointing to exactly the same article:
https://stackoverflow.com/a/71621579
Have you definitely restarted your computer since making the change?

It looks like openxlsx uses a 3rd party library to access the files inside the .zip archive (which is what .xlsx files actually are), so you may want to try using that package instead.

I have by no means digested all of the content in this post, but it gives me hope that perhaps the problem is going to be fixed at the source, i.e. in R itself, in the not-too-distant future:

https://blog.r-project.org/2023/03/07/path-length-limit-on-windows/

I'm sorry, I had not seen the responses in this thread. I made the change in gpedit.msc and restarted also but the problem persists.

It is possible that the next version of R will handle long paths better and solve this for us.