fhenz/SheetReader-r

Can you add some parameters?

Closed this issue · 3 comments

This package is indeed very good. Recently, it has solved my problem of reading large xlsx, but I hope the author can add more custom parameters, such as

  1. Specify the data type for each column,
  2. Specify the na value
  3. When reading data, do not introduce Scientific notation, especially if there are both text and numbers in a column, text will be selected by default, but numbers will be recognized as scientific counting
  4. There seems to be a coding issue? (test file)
> mm = SheetReader::read_xlsx(path = f2, sheet = 1)
> head(mm$Profit)
[1] "本期利润" "没有单位"
[3] NA                                 "1.72925e+08"                     
[5] NA                                 NA  
fhenz commented

Thank you,
I have pushed a fix for 4., there was an issue with xml-escaped unicode characters. If you have devtools you can try to install via install_github("fhenz/SheetReader-r"), I will probably only upload a new CRAN version once I have also addressed some of your other points.

I think 1. and 2. are both good ideas, I will try to implement something similar to what readxl also has.
3. is a bit tricky because Excel doesn't differentiate between integer or real numbers when storing, but I should be able to solve this more elegantly if I implement 1. (so it would then be solved by specifiying string/text as the column data type, that should be sufficient?).

Thank you, I have pushed a fix for 4., there was an issue with xml-escaped unicode characters. If you have devtools you can try to install via install_github("fhenz/SheetReader-r"), I will probably only upload a new CRAN version once I have also addressed some of your other points.

I think 1. and 2. are both good ideas, I will try to implement something similar to what readxl also has. 3. is a bit tricky because Excel doesn't differentiate between integer or real numbers when storing, but I should be able to solve this more elegantly if I implement 1. (so it would then be solved by specifiying string/text as the column data type, that should be sufficient?).

Thank you for your reply. Indeed, if 1 is resolved, then 3 can theoretically be resolved,

A new parameter col_types has been added that allows specifying the data types for columns via named/unnamed character vector, e.g. read_xlsx([...], col_types=c("Profit"="text")).