Improving clustering through dimension reduction using Invariant Coordinate Selection

Alfons, A., Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. Invited

Date

September 14 – 16, 2022

Time

12:00 AM

Location

Naples, Italy.

Event

Abstract

Dimension reduction is an important pre-processing step in the multivariate analysis field, improving the identification of outliers or clusters among others. The well-known Principal Component Analysis (PCA), which can be derived from the decomposition of a scatter matrix, is one of the most famous dimension reduction techniques, but it may not be the best choice for clustering purposes. Indeed, looking for clusters on the subspaces given by PCA can be too restrictive as it is only based on the variability of the data. An alternative approach is the Invariant Component Selection (ICS) multivariate method, which relies on the joint diagonalization of two scatter matrices. By comparing two scatter matrices, ICS goes beyond PCA and finds more directions of interest. In addition, for combinations of affine equivariant scatter matrices, ICS returns affine invariant components, i.e. it is not scaled dependent contrarily to PCA. Two challenging steps are the choice of the pair of scatter matrices and the selection of the components to retain. [1] derive some theoretical results to guarantee that under some elliptical mixture models, the structure of the data can be highlighted on a subset of the first and/or last components. ICS is able to recover the Fisher’s linear discriminant subspace even in the case where the group identifications are unknown. In terms of end goals, ICS has been studied in the literature for outlier detection purposes [2] but has received little attention concerning clustering tasks. The aim is to evaluate the performance of several well-known clustering algorithms on the initial data, and on its reduced version using ICS. A comparison of different scatter matrices and dimension selection methods is performed on benchmark data sets to clarify in which context ICS improves the identification of clusters.

Details
Posted on:
December 19, 2022
Length:
2 minute read, 322 words
See Also: