ml-lib/CodeLib

[Feature]: Clustering: Optimal k

bdiptesh opened this issue · 0 comments

Is your feature request related to a problem? Please describe.

A clustering module to cluster any given data (categorical/continuos/ordinal) and returns optimal clustering solution.

Describe the solution you'd like

Compute optimal clustering solution using gap-statistic.

Methods:

  1. First SE
  2. Maximum Gap

Expected input(s)

df: pandas.DataFrame
x_var: List[str]
max_cluster: int
method: Union[str]

Expected output(s)

opt_k

Additional context

No response

Acceptance criteria

Integration tests:

  • Categorical variables only
  • Continuos variables only
  • Ordinal variables only
  • Combination of categorical/ordinal/continuos

Version

v0.4.0