tidyverse/readxl

Add skip=1 parameter leads to last empty column get missing

ahbon123 opened this issue · 1 comments

Given a sample excel file from this link, I attempted to read Sheet2 from the excel file, but when I add skip=1, we can see the last empty column (acc_score) get missing as well, how could we prevent this? The expected result is to skip the first row and keep the last column. Thanks.

> readxl::read_excel(xlsx_file,
+            sheet = 'Sheet2',
+            skip = 1,
+            # col_names=col_names
+            )
New names:
* `` -> ...2
* `` -> ...3
* `` -> ...4
# A tibble: 3 x 6
  agri  ...2  ...3  ...4  `6.32929944992065` right
  <chr> <lgl> <lgl> <lgl> <chr>              <chr>
1 indus NA    NA    NA    11.21162109375     right
2 oil   NA    NA    NA    1.45932525119925   right
3 metal NA    NA    NA    2.10280250811577   right

Related link:

https://stackoverflow.com/questions/71156178/add-skip-1-parameter-leads-to-last-empty-column-missing-while-using-readxlread

This is just readxl's documented behaviour.

https://readxl.tidyverse.org/articles/sheet-geometry.html

By default, read_excel() uses the smallest rectangle that contains the non-empty cells. It “shrink wraps” the data.

Once you direct readxl to ignore the first row (the column headers), in the absence of any other information about what range you want to read, it has no reason to bring in that column. If it did, why not the next empty column? And the next? Etc.