spectral-cockpit/opusreader2

Error in strsplit(sample_name_long, split = ";") : non-character argument

Closed this issue ยท 15 comments

Hi,

I'm having a strange issue with reading OPUS files and requesting only the data. This used to work, about a year ago:

read_opus('e:/mir/MIR-training/HI-test/spectra/103283XS01.0', data_only = TRUE)

The error message looks like this:

Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

Reading the file works as expected with data_only = FALSE.

I've tried again with an example OPUS file, same error.

> f <- opus_file()
> read_opus(f, data_only = TRUE)
Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

I have the same issue:
data_list <- read_opus_single (dsn = "C:/Users/Documents/Data/POM characterization/FTIR/Data/DByDate/15_2_2024/L12_Y24.0")
Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
unknown timezone 'Etc/GMT+1 '
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", :
unknown timezone 'Etc/GMT+1 '
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/GMT+1 '

For the warning in strptime, we tracked the problem down to unexpected whitespace in the timezone; in get_meta_timestamp our timezone is coming out "(GMT+1)\t" with a spare tab character. We solved it for the moment by updating a regular expression to strip whitespace from the end of the timezone (line 29 in extract_metadata.R):

tz <- gsub(pattern = "\\(|\\)|\\s+$", "", x = time_hour_tz[3L])

I'm reluctant to post this in a PR because I'm unsure where this tab character is coming from, and whether this is better handled somewhere in the parsing functions.

Hi,

I'm having a strange issue with reading OPUS files and requesting only the data. This used to work, about a year ago:

read_opus('e:/mir/MIR-training/HI-test/spectra/103283XS01.0', data_only = TRUE)

The error message looks like this:

Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

Reading the file works as expected with data_only = FALSE.

Hi Dylan, thanks for bringing this issue up! I'll checkout a solution that fixes the locale setups we know, possibly this week.

For the warning in strptime, we tracked the problem down to unexpected whitespace in the timezone; in get_meta_timestamp our timezone is coming out "(GMT+1)\t" with a spare tab character. We solved it for the moment by updating a regular expression to strip whitespace from the end of the timezone (line 29 in extract_metadata.R):

tz <- gsub(pattern = "\\(|\\)|\\s+$", "", x = time_hour_tz[3L])

I'm reluctant to post this in a PR because I'm unsure where this tab character is coming from, and whether this is better handled somewhere in the parsing functions.

Thanks for a first ad-hoc solution, i'll investigate a bit further and see whether we can use a fix that generalizes.

Great, we can provide a sample file if needed.

Great, we can provide a sample file if needed.

please, very happy if you do so. baumann dash philipp at protonmail dot com or here. would as well be great if we can add this file to the test suite to check for that datetime behavior (let's see how this is linked to OPUS/software/language/timezone settings).

Thanks everyone. I'll also send an example file!

This should be fixed now :-) Would be great if you can briefly check out the PR and test with your example file, too. Also, would be good to have an extra test file. Good to not test CRAN size limit, but it can't hurt to solidify testing with reverse engineering ;-) In that case, we would also provide institution and device details in the package.

More infos what was done in PR.

@mtalluto regex for parsing for extra tab/space was useful, and it provides stable to get rid of the warning.

Overall, we were missing to parse "sample" and "history" blocks when data_only = TRUE. Therefore, sample name and timezone infos from the OPUS files weren't included. It is good to do so I think.

cc @ThomasKnecht , would be great if you can give a short review of the code (if time) :-)

Thank you all for catching this bug ๐Ÿ’ฏ

@mtalluto regarding the fix of the warning; the correct time info we could only find in the history. It is a messy file (Windows origin plus proprietary Bruker), so that's the best we can do to parse the text lines.

Tested from my end, our file now loads with no warnings. I sent you the file via email, feel free to include with the tests.

Thanks!

Thanks!

welcome, thanks also for the file. And if you have any suggestions or comments in the future, happy to discuss.