Computer Science and Statistics Faculty Publications

CLUSTERED HIERARCHICAL ANOMALY AND OUTLIER DETECTION ALGORITHMS

Najib Ishaq, University of Rhode Island
Thomas J. Howard III, University of Rhode Island
Noah M. Daniels, University of Rhode IslandFollow

Document Type

Article

Date of Original Version

11-24-2021

Department

Computer Science and Statistics

Abstract

Anomaly and outlier detection is a long-standing problem in machine learning. In some cases, anomaly detection is easy, such as when data are drawn from well-characterized distributions such as the Gaussian. However, when data occupy high-dimensional spaces, anomaly detection becomes more difficult. We present CLAM (Clustered Learning of Approximate Manifolds), a manifold mapping technique in any metric space. CLAM begins with a fast hierarchical clustering technique and then induces a graph from the cluster tree, based on overlapping clusters as selected using several geometric and topological features. Using these graphs, we implement CHAODA (Clustered Hierarchical Anomaly and Outlier Detection Algorithms), exploring various properties of the graphs and their constituent clusters to find outliers. CHAODA employs a form of transfer learning based on a training set of datasets, and applies this knowledge to a separate test set of datasets of different cardinalities, dimensionalities, and domains. On 24 publicly available datasets, we compare CHAODA (by measure of ROC AUC) to a variety of state-of-the-art unsupervised anomaly-detection algorithms. Six of the datasets are used for training. CHAODA outperforms other approaches on 16 of the remaining 18 datasets. CLAM and CHAODA scale to large, high-dimensional “big data” anomalydetection problems, and generalize across datasets and distance functions. Source code to CLAM and CHAODA are freely available on GitHub1.

Publication Title, e.g., Journal

2021 IEEE International Conference on Big Data (Big Data)

Citation/Publisher Attribution

Ishaq, Najib, Thomas J. Howard, and Noah M. Daniels. "CLUSTERED HIERARCHICAL ANOMALY AND OUTLIER DETECTION ALGORITHMS." 2021 IEEE International Conference on Big Data (Big Data) (2021). doi: 10.1109/BigData52589.2021.9671566.

Download

COinS

DOI

https://doi.org/10.1109/BigData52589.2021.9671566

Author Manuscript

This is a pre-publication author manuscript of the final, published article.

Terms of Use

This article is made available under the terms and conditions applicable
towards Open Access Policy Articles, as set forth in our Terms of Use.

Computer Science and Statistics Faculty Publications

CLUSTERED HIERARCHICAL ANOMALY AND OUTLIER DETECTION ALGORITHMS

Document Type

Date of Original Version

Department

Abstract

Publication Title, e.g., Journal

Citation/Publisher Attribution

DOI

Author Manuscript

Terms of Use

Search

Browse

Author Corner

Computer Science and Statistics Faculty Publications

CLUSTERED HIERARCHICAL ANOMALY AND OUTLIER DETECTION ALGORITHMS

Authors

Document Type

Date of Original Version

Department

Abstract

Publication Title, e.g., Journal

Citation/Publisher Attribution

Share

DOI

Author Manuscript

Terms of Use

Search

Browse

Author Corner