Comparison of statistical methods for multivariate outlier detection with R packages

Archimbaud A., Nordhausen K. and Ruiz-Gazen A.

Date

December 12 – 14, 2015

Time

12:00 AM

Location

University of London, UK

Event

Abstract

Detection of multivariate outliers is a relevant topic in many fields such as fraud detection or manufacturing defects detection. Several nonsupervised multivariate methods exist and some are based on robust and non-robust covariance matrices estimators such as the Mahalanobis distance (MD) and its robust version (RD), the robust Principal Component Analysis (PCA) with its diagnostic plot and the Invariant Coordinate Selection (ICS). The objective is to compare these different methods. Note that all these methods lead to one or several scores associated to each observation and high scores are associated with potential outliers. For robust PCA and ICS, some components are selected and outliers are identified by using some test procedure. This last step is not trivial: relevant cut-offs have to be determined and the simultaneity of tests has to be taken into account. The comparison is performed on simulated data sets with mixtures of Gaussian distributions in the context of a small proportion of outliers and when the number of observations is at least five times the number of variables. The Minimum Covariant Determinant (MCD) estimator is considered. The implementation is based on functions from the R packages: robustbase, rrcov, ICS and CerioliOutlierDetection.

Details
Posted on:
December 12, 2015
Length:
2 minute read, 229 words
See Also: