Enhancing the Performance of the PSO Algorithm for Clustering High dimensional data using Autoencoders
Abstract
The emergence of big data has brought new challenges in processing and analyzing large and complex datasets due to their high dimensionality. Unsupervised learning techniques like clustering have become powerful tools for identifying patterns and relationships in data without the need for labeled examples. One popular Unsupervised data clustering technique is K-means and Particle Swarm Optimization(PSO). Using K-means clustering with optimization can lead to better clustering results by combining the strengths of both algorithms. To automate the data clustering, the Elbow method is implemented, which provides the K value for implementing K-means and PSO. Clustering high-dimensional data can be challenging due to the curse of dimensionality, where the number of dimensions dramatically outnumbers the number of data points. Therefore, a dimensionality reduction technique must be employed to enhance the performance of clustering high-dimensional data. Thus, we used Autoencoder as one of the dimensionality reduction techniques with K-means and PSO clustering and compared the clustering performance on reduced and original data
Objectives
- Design and develop a PSO algorithm for automatic data clustering.
- Design and develop PSO employing Autoencoder for data clustering.
- Compare the performance of PSO and Autoencoder-based PSO data clustering algorithms using different validity indices.
- Apply this algorithm to Stock Market Data and obtain inferences.
Methodology
Results
Method | K-Means PSO | K-Means PSO with Autoencoders |
---|---|---|
Dataset | DB Index Silhouette Index | DB Index Silhouette Index |
High | 0.99316 0.044056 | 0.499879 0.598376 |
Low | 0.98635 0.079333 | 0.492837 0.694484 |
Close | 0.98474 0.046373 | 0.474543 0.634368 |
Open | 0.93643 0.056383 | 0.547732 0.745483 |
Volume | 0.99736 0.043367 | 0.498746 0.648464 |
Conclusion
Based on the evaluation metrics, used to measure the quality of clustering such as DB-index and Silhouette index, the PSO and K-means algorithm with autoencoders outperformed compared to PSO and K-means without autoencoders.