/ProcessingMethods

The common cluster methods based on sklearn

Primary LanguageC++

Processing Methods Assemble

Summary of most frequently used data processing methods (Non-DNN method). Most of them are related to unsupervised learning and representation learning.

cluster methods

in sklearn.cluster

Using datasets. make_blobs The high-order data generated by blobs, using the clustering results of the above three clustering methods image image

Method Name Hyper-Para Description
K-Means #classes Specify the number of categories in advance as Hyper-Para. Batch-wise version: minibatch
Mean-Shift band width Smaller BW get more #classes
DBSCAN eps, For nonlinear clustering, each class cannot give the central coordinates

dimensionality reduction methods

in sklearn.decomposition(PCA), sklearn.discriminant_analysis(LDA), sklearn.manifold(t-SNE, LLE)

image

Method Name Hyper-Para Description
PCA #components The covariance matrix is optimized so that the diagonal elements are in descending order and other positions are 0. Batch-wise version: incremental
LDA Class Label. Maximize the distance between classes and minimize the distance within classes.
LLE --- ---
t-SNE #components On the premise of keeping the high-dimensional distance unchanged, the data points are remapped to 2 or 3 dimensions for visualization. It is not a clustering algorithm, but a means to analyze the effectiveness of clustering. Original space (n-D): Gaussian dist. ; Embedded space (2-D): t dist.

image