SDE: A Novel Clustering Framework Based on Sparsity-Density Entropy

Document Type


Date of Original Version



Clustering of data with high dimension and variable densities poses a remarkable challenge to the traditional density-based clustering methods. Recently, entropy, a numerical measure of the uncertainty of information, can be used to measure the border degree of samples in data space and also select significant features in feature set. It was used in our new framework based on the sparsity-density entropy (SDE) to cluster the data with high dimension and variable densities. First, SDE conducts high-quality sampling for multidimensional data and selects the representative features using sparsity score entropy (SSE). Second, the clustering results and noises are obtained adopting a new density-variable clustering method called density entropy (DE). DE automatically determines the border set based on the global minimum of border degrees and then adaptively performs cluster analysis for each local cluster based on the local minimum of border degrees. The effectiveness and efficiency of the proposed SDE framework are validated on synthetic and real data sets in comparison with several clustering algorithms. The results showed that the proposed SDE framework concurrently detected the noises and processed the data with high dimension and various densities.

Publication Title

IEEE Transactions on Knowledge and Data Engineering