Processing Methods Assemble

Summary of most frequently used data processing methods (Non-DNN method). Most of them are related to unsupervised learning and representation learning.

cluster methods

in sklearn.cluster

Using datasets. make_blobs The high-order data generated by blobs, using the clustering results of the above three clustering methods

Method Name	Hyper-Para	Description
K-Means	#classes	Specify the number of categories in advance as Hyper-Para. Batch-wise version: minibatch
Mean-Shift	band width	Smaller BW get more #classes
DBSCAN	eps,	For nonlinear clustering, each class cannot give the central coordinates

dimensionality reduction methods

in sklearn.decomposition(PCA), sklearn.discriminant_analysis(LDA), sklearn.manifold(t-SNE, LLE)

Method Name	Hyper-Para	Description
PCA	#components	The covariance matrix is optimized so that the diagonal elements are in descending order and other positions are 0. Batch-wise version: incremental
LDA	Class Label.	Maximize the distance between classes and minimize the distance within classes.
LLE	---	---
t-SNE	#components	On the premise of keeping the high-dimensional distance unchanged, the data points are remapped to 2 or 3 dimensions for visualization. It is not a clustering algorithm, but a means to analyze the effectiveness of clustering. Original space (n-D): Gaussian dist. ; Embedded space (2-D): t dist.

RebuildZoo/ProcessingMethods

Processing Methods Assemble

cluster methods

dimensionality reduction methods