R-Lum/Luminescence

Risoe.BINfileData2RLum.Analysis. X axis in the data must be changed

Closed this issue · 23 comments

Expected behaviour

list of values, from 0.16 to 40, by 0.16

Observed behaviour

list of values, from 0 to 40, by 0.1606426

Running mini example

rlum <- Risoe.BINfileData2RLum.Analysis(bin)

library(Luminescence)

hello

for example, I have an OSL curve, for 40 seconds, with 250 data channels. The default setting, on a Risoe!

the RLum object, created via Risoe.BINfileData2RLum.Analysis generates X-axis values, for each OSL curves. This is great. but there is a problem

the x-axis ranges
from 0 to 40, by 0.1606426
which is 40/(250 - 1)

it does not make sense. the data is recorded and aggregated over a single data channel. the corresponding x value is traditionally taken to be the end value of the channel.

so
from 0.16 to 40, by 0.16
which is 40/250

alternates ways are possible
0 to 39.84, by 0.16
also 250 channels

or 0.08 to 39.92, by 0.16

the first is more desirable (0.16 to 40), but the third is also good (0.08). The 2nd should be avoided, for 0 does not go well whenever you rely on a < log > scale.

The TL curves (simple ramp, no hold) also show the same issue
observed: from 0 to 240, by 1.004184
expected: from 1 to 240, by 1

I tried to track down the source, in the code, but lost it around Risoe.BINfileData2RLum.Data.Curve. So, I do not know where you actually do this x-axis calculation. I wanted to help, but I do not have the time right now.

thanks!

Sebastien

Salut Sébastien,

This an issue that gave me already a headache some time ago and I guess that whatever I will write here will not be satisfying in your eyes since, you are right, from the scientific point of view (and also from the data processing perspective), the current solution is not perfect (but a reasonable compromise):

  1. Starting from 0 is indeed hardly justified if you consider that we work with channels and the PMT (or the controller later) sums up the counts after a preset time interval. So, the first versions of 'Luminescence' did what is logical; they started at, in your case 0.16 with the first channel. This was ok until I tried to understand difference between published curve deconvolution data and data I had analysed. Surprisingly I realised: They start at 0 s. Wow, so I double-checked with the Analyst and, again, wow, all curves started at 0 s (which is still the case). After a chat, back in summer 2015, with Geoff and the R Team, I also changed the implementation to 0 s. The reasoning is twofold: (A) Users do not like diverging results from different tools (I know, a weak argument), and (B) our mathematical models do not account discretisation effects, it does not matter for small channels, but it impacts the output when the channels become larger.

  2. The reason for the last channel is that BIN/BINX-file does not define the x-axis but leaves it to the user what to do about. The reader provides: LOW, HIGH and NPOINTS (the byte naming according to the format definition). Consequently, from a programming point of view, by definition, I have to define a sequence of numbers running from LOW to HIGH with the length NPOINTS. Since LOW is set to 0, and HIGH to, your example, 40, what you observe is what you get if you follow ((LOW - HIGH)/(NPOINTS - 1) to include LOW and HIGH. In other words, these values are not correct in the first place. The Analyst returns, however, at HIGH - channel_resolution, so in your case 39.92, which contradicts my argument above. Bottom line, this is a design flaw in the file format. It requires to make an assumption on a recorded signal based on timestamp information of the LED (but not the detector). Hard to get it right this way.

Perhaps we should follow the Analyst example? What do you think?

Besides, the code creating the x-axis is written in C++ and can be found here: https://github.com/R-Lum/Luminescence/blob/master/src/src_create_RLumDataCurve_matrix.cpp. Before we had an R code only solution, but for importing large datasets (> 100,000 curves) the performance impact was too much.

The problem is not the physic but the math. The equations we are using do not account for the discretization effects. No problem for small channels but it becomes significant for large channels, albeit this is more a philosophic than a real practical problem, at least for CW-OSL.

Dividing the first channel in half sounds tempting, but what is the justification for it? From the data processing point of view, nothing was returned at this time. We do not know the count number at this time. It might be 100, it might be 100/2, it might be exp(x). I don't know, it depends on your signal. The best time would be the time of the arrival of the first photon. But we don't know this either.

So treating the data like the Analyst means a shift towards zero, sounds ok to me, just a different understanding.

Anyway, I see your points, all also justified. For the moment, I just don't know how to tackle them for the above and reasons as mentioned earlier. In the XSYG-file format, we decided to go for the 0.16 -> 40 solution, this is a least a clean solution regarding the data processing.

I will put it on the list, this needs to be done in a different context.

@SebastienHuot I doubled check what the Viewer and the Viewer Plus are doing.

  1. Viewer (BIN/BINX-files < v7): Shows only channels, nothing can go wrong here
  2. Viewer Plus starts with the end of the first channel and ends at the time stamp of the last channel; probably the most logical solution.

Means, I will switch to that solution, which is also consistent with the XSYG-file format.

I just updated 'Luminescence' here on GitHub (branch: dev_0.9.x), the package released on CRAN does not contain the last two changes, sorry. I am not sure when I am going to release what will become version 0.9.4. It usually takes half a day for a release due to all the issues I have to consider along the way. Maybe at the end of August.

It appears that the new commit passed all CI tests, means my changes do not have obvious unwanted side effects. If you like you can install the development version with the changes
included via

devtools::install_github("R-Lum/Luminescence@dev_0.9.x")

I will keep this ticket open for a while, just to be sure.

Thanks for the discussion and your support!

slight update on the x formula

from LOW, HIGH and NPOINTS, the formula is (using the SEQ numenclature)
from (High-Low)/NPoints+Low
to High
by (High-Low)/NPoints
-> 0.16 -> 40

@SebastienHuot Is it an update because something is wrong in the function? I double-check but it does what it should do: LOW = 0, HIGH = 40, NPOINTS = 250 results in

0.16  0.32  0.48  0.64 ... 40

The length of this object is 250. Did I overlook something?

a discrepancy appeared if LOW > 0, typical from the Lexsyg's TL curves
such as
LOW = 25
HIGH = 199
NPOINTS = 174

as inscribed in the binx file. For the Lexsyg data, I agree it makes more sense to grab the < x > directly from the xsyg file, as it is recorded along with the PMT data.

as far as I know, the Risoe always has LOW = 0.

Ah, because you mean now it should start at 25, not at 26, right?

Does your R terminal show something like:

[src_create_RLumDataCurve_matrix()] BIN/BINX-file non-conform. TL curve may be wrong!

If so, I know what is your problem, but this is related to FI not respecting the BIN/BINX-file format.

[src_create_RLumDataCurve_matrix()] BIN/BINX-file non-conform. TL curve may be wrong!
for TL, from Lexsyg. Yes, all the time!

with
LOW = 25
HIGH = 199
NPOINTS = 174

from (High-Low)/NPoints+Low: (199-25)/174 + 25 = 26
to High: 199
by (High-Low)/NPoints: (199-25)/174 = 1

So, the answer is 26°C. the end of the first data channel

Ah ok, now it's clear. The reason behind: If the BINX-file claims to be of version >= 4 and the curve is a TL curve, the format definition requires that the byte positions TOLON, TOLOFF and TOLDELAY are > 0. If this is not the case, the function still tries to calculate x-values according to the BINX-file format manual, but, as the message states, the curve might be wrong.

Can you send me one BINX-file with this problem via email? Then I can have a look; just in case. But if it is as I assume, perhaps there is nothing I can change, because if I would, I would break TL curves imported from Risø BINX-files which follow the format definition.

No, the bytes are used by the reader to implement various preheat possibilities, including the preheat plateau (features introduced with version 4). Anyway, thanks for your files, I will see what I can do. Maybe I have overlooked something.

I forgot to add, a TL curve has three parts:

  1. first ramping, here we need the Byte information from LOW, AN_TEMP, TOLDELAY
  2. the plateau; simply TOLON
  3. the end 'ramping' , where we need AN_TEMP, HIGH, TOLOFF

Ok, I will modify my question: Did you test the TL curves with the updated version of the package? Because it should start from, your example, 26 to X if LOW is 25. Probably I should have mentioned that I did not use your equation. The implementation is the following (it's C++):

NumericVector seq(int from, int to, double length_out) {

  //set variables
  NumericVector sequence = length_out;
  double by = (to - from) / length_out;

  //set first channel
  sequence[0] = from + by;

  //loop and create sequence
  for (int i=1; i < length_out; ++i){
    sequence[i] = sequence[i-1] + by;

  }
  return sequence;
}

Regarding the plateau plots: You are completly right, however, I follow the implementation by Risø to have data visualation similar to their own software. Besides, you can always create your own plots in the way you want it.

@SebastienHuot Do you think I can close this issue and make it part of a new submission?

Ok, big thanks @SebastienHuot for supporting us with here. I'll close this issue now, but if needed we reopen it later.