Elbow plot variance calculation

Question

Elbow plot variance calculation

Jeff87075 opened this issue 3 years ago · 3 comments

Hi, I want to ask how the following formula for calculating the variance (that will be used in the elbow plot) is derived?

  c_wss <- 0
  for(j in seq_along(clustering)){
    if(sum(clustering == j) > 1){
      c_wss <- c_wss + (nrow(data[clustering == j, , drop = FALSE]) - 1) *
        sum(apply(data[clustering == j, , drop = FALSE], 2, stats::var))
    }
  }

I understand that the sum() part is calculating the within sum of squares but why does it have to be multiplied by what I assume is the degrees of freedom with the nrow() - 1? Thanks a lot!

Answer 1 · 2021-10-18T15:01:50.000Z

Mm, I'm trying to remember. I would assume the main idea here was to take a weighted version (so larger clusters contributing more), I'm just not sure where the minus 1 is coming from, and whether this weighting with the number of datapoints is necessary in the first place...
There certainly might be a mistake in this code, because it actually is not working that well, and typically when using FlowSOM we handpick the number of metaclusters rather than using this automated approach.

Answer 2 · 2021-10-20T14:17:13.000Z

Ah I see, an automated approach certainly has its limitations. On the topic of the SOM algorithm, since I see that the flowSOM package has its own codes for performing the SOM, can I also ask what are the major differences between the SOM performed in flowSOM versus the SOM algorithm introduced by the kohonen package?

Answer 3 · 2021-10-20T14:50:12.000Z

The FlowSOM package builds on the kohonen package as it was at the time, so in essence will be exactly the same. However, the code has been simplified, in the sense that some properties we did not expect to use (e.g. hexagonal or toroidal topologies) were removed, and some additional options have been added (for example, we explored some different distance measures although we keep using euclidean distance most of the time).

…

On Wed, 20 Oct 2021 at 16:17, Jeff87075 ***@***.***> wrote: Ah I see, an automated approach certainly has its limitations. On the topic of the SOM algorithm, can I also ask what are the major differences between the SOM performed in flowSOM versus the SOM algorithm introduced by the kohonen package? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOS724B7W7CKZRRCBYRXUTUH3FPJANCNFSM5GDJTBTQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.