Error in strsplit(sample_name_long, split = ";") : non-character argument

Question

Error in strsplit(sample_name_long, split = ";") : non-character argument

Closed this issue 10 months ago · 15 comments

Hi,

I'm having a strange issue with reading OPUS files and requesting only the data. This used to work, about a year ago:

read_opus('e:/mir/MIR-training/HI-test/spectra/103283XS01.0', data_only = TRUE)

The error message looks like this:

Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

Reading the file works as expected with data_only = FALSE.

dylanbeaudette commented 10 months ago

Thanks!

Answer 1 · 2024-02-14T19:35:49.000Z

I've tried again with an example OPUS file, same error.

> f <- opus_file()
> read_opus(f, data_only = TRUE)
Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

Answer 2 · 2024-02-27T11:10:46.000Z

I have the same issue:
data_list <- read_opus_single (dsn = "C:/Users/Documents/Data/POM characterization/FTIR/Data/DByDate/15_2_2024/L12_Y24.0")
Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
unknown timezone 'Etc/GMT+1 '
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", :
unknown timezone 'Etc/GMT+1 '
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/GMT+1 '

Answer 3 · 2024-02-27T13:01:40.000Z

For the warning in strptime, we tracked the problem down to unexpected whitespace in the timezone; in get_meta_timestamp our timezone is coming out "(GMT+1)\t" with a spare tab character. We solved it for the moment by updating a regular expression to strip whitespace from the end of the timezone (line 29 in extract_metadata.R):

tz <- gsub(pattern = "\\(|\\)|\\s+$", "", x = time_hour_tz[3L])

I'm reluctant to post this in a PR because I'm unsure where this tab character is coming from, and whether this is better handled somewhere in the parsing functions.

Answer 4 · 2024-02-27T22:31:06.000Z

Hi,

I'm having a strange issue with reading OPUS files and requesting only the data. This used to work, about a year ago:
read_opus('e:/mir/MIR-training/HI-test/spectra/103283XS01.0', data_only = TRUE)
The error message looks like this:
Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'
Reading the file works as expected with data_only = FALSE.

Hi Dylan, thanks for bringing this issue up! I'll checkout a solution that fixes the locale setups we know, possibly this week.

Answer 5 · 2024-02-27T22:37:08.000Z

For the warning in strptime, we tracked the problem down to unexpected whitespace in the timezone; in get_meta_timestamp our timezone is coming out "(GMT+1)\t" with a spare tab character. We solved it for the moment by updating a regular expression to strip whitespace from the end of the timezone (line 29 in extract_metadata.R):
tz <- gsub(pattern = "\$|\$|\\s+$", "", x = time_hour_tz[3L])
I'm reluctant to post this in a PR because I'm unsure where this tab character is coming from, and whether this is better handled somewhere in the parsing functions.

Thanks for a first ad-hoc solution, i'll investigate a bit further and see whether we can use a fix that generalizes.

Answer 6 · 2024-02-28T09:10:24.000Z

Great, we can provide a sample file if needed.

Answer 7 · 2024-02-28T16:52:33.000Z

Great, we can provide a sample file if needed.

please, very happy if you do so. baumann dash philipp at protonmail dot com or here. would as well be great if we can add this file to the test suite to check for that datetime behavior (let's see how this is linked to OPUS/software/language/timezone settings).

Answer 8 · 2024-02-28T16:57:50.000Z

Thanks everyone. I'll also send an example file!

Answer 9 · 2024-02-29T13:42:14.000Z

This should be fixed now :-) Would be great if you can briefly check out the PR and test with your example file, too. Also, would be good to have an extra test file. Good to not test CRAN size limit, but it can't hurt to solidify testing with reverse engineering ;-) In that case, we would also provide institution and device details in the package.

More infos what was done in PR.

@mtalluto regex for parsing for extra tab/space was useful, and it provides stable to get rid of the warning.

Overall, we were missing to parse "sample" and "history" blocks when data_only = TRUE. Therefore, sample name and timezone infos from the OPUS files weren't included. It is good to do so I think.

Answer 10 · 2024-02-29T13:42:53.000Z

cc @ThomasKnecht , would be great if you can give a short review of the code (if time) :-)

Answer 11 · 2024-02-29T13:46:22.000Z

Thank you all for catching this bug 💯

Answer 12 · 2024-02-29T14:01:09.000Z

@mtalluto regarding the fix of the warning; the correct time info we could only find in the history. It is a messy file (Windows origin plus proprietary Bruker), so that's the best we can do to parse the text lines.

Answer 13 · 2024-02-29T14:51:48.000Z

Tested from my end, our file now loads with no warnings. I sent you the file via email, feel free to include with the tests.

Answer 14 · 2024-03-12T21:12:05.000Z

Thanks!

welcome, thanks also for the file. And if you have any suggestions or comments in the future, happy to discuss.