SofieVG/FlowSOM

Is the integer cast in abs() really intended?

LTLA opened this issue · 3 comments

LTLA commented

For various reasons, I am writing a C++ implementation of som.c, and I noticed

FlowSOM/src/som.c

Lines 133 to 134 in 6f79b8b

tmp = data[i + j*n] - codes[cd + j*ncodes];
change += abs(tmp);

tmp is double-precision but abs() takes integer arguments. This call will truncate tmp to integer before actually computing the absolute difference. Nothing will be added to change if all tmp lie in (-1, 1). As it stands now, the iterations will terminate if all tmp have absolute values less than 1. Is this intentional? Did you mean to use fabs() instead?

Regardless, I don't understand the motivation behind the stop condition of change < 1. A threshold of 1 seems pretty arbitrary when the scale of the input data could be anything. Why not just terminate when the change in the code coordinates drops below a certain level (e.g., defined as some small fraction of the standard deviation across codes)?

SamGG commented

Interesting point.
I don't think there is such a stop condition in the implementation of the kohonen package.
https://github.com/rwehrens/kohonen/blob/28eb3599bdc423e94a4a1e09310ffd9de667ead8/kohonen/src/supersom.cpp#L93-L173

Dear @LTLA,

This integer cast was indeed not intended and has already been changed to fabs() on my local installation. Apparently that update hasn't made it to this github yet, but I will update soon.

The idea behind the change parameter is indeed to check for convergence, but I agree that the value 1 can be kind of arbitrary, especially when the package is getting wider use. I think we choose this value back then because it made sense for the datasets we were working with at the time. I'll consider updating this in the future, either at least make it a parameter, or indeed, as you are suggesting, computing it based on the data range itself.

Kind regards,
Sofie

LTLA commented

Okay, good to see that it's fixed.