Estimation of Gamma in K-Prototypes

Question

Estimation of Gamma in K-Prototypes

crixus5678 opened this issue 2 years ago · 1 comments

For the estimation of gamma in k-prototypes, the current implementation appears to estimate the gamma by using 0.5 * (standard deviation of all numeric data).

However, in the paper [Huang 1997] it was mentioned that gamma is guided by the " average standard deviation of numeric attributes". If that is the case, shouldn't we be calculating the mean for all the standard deviation of each numeric attribute?

Answer 1 · 2022-09-06T05:51:55.000Z

For reference, from the paper:

Generally speaking, γ_l is related to σ_l , the average standard deviation of numeric attributes in cluster l. In practice, σ_l can be used as a guidance to determine γ_l . However, since σ_l is unknown before clustering, the overall average standard deviation σ of numeric attributes can be used for all σ_l.

So yes, it appears you are correct in your statement.