Abstract:
Discriminative Dimension Selection (DDS) has emerged as a crucial technique for distilling meaningful insights from complex, high-dimensional datasets. By pinpointing the most relevant features, DDS facilitates a clearer understanding of data patterns, aiding in both analysis and visualization. This research delves into the practical application of DDS within the framework of K-means clustering, a widely used algorithm for partitioning datasets into coherent groups. However, traditional K-means struggles with highdimensional data due to the curse of dimensionality, often resulting in suboptimal clustering outcomes and diminished interpretability. To address this challenge, we propose a novel extension to K-means, termed Overlap-Resolved Clustering (ORC), which integrates DDS with overlapping clusters and dimensions. By leveraging post-processing techniques, we selectively retain informative features while discarding redundant or irrelevant ones, thereby refining the feature set. The efficacy of our method is rigorously evaluated using established cluster validity indices such as the Silhouette Coefficient Score (SS), DaviesBouldin Index (DB), and Calinski-Harabasz Index (CH). Through comparative analysis with traditional K-means, our results consistently demonstrate the superiority of ORC in achieving enhanced clustering performance and enhance interpretability. Furthermore, our study underscores the broader implications of DDS in the visualization and analysis of high-dimensional data within the context of Kmeans clustering. By shedding light on the potential of DDS-driven approaches, we contribute to advancing both the theoretical understanding and practical utility of dimensionality reduction techniques in data science
Kasetsart University. Office of the University Library