Improving clustering through dimension reduction using Invariant Coordinate Selection
Alfons, A., Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. Invited
Date
September 14 – 16, 2022
Time
12:00 AM
Location
Naples, Italy.
Event
Abstract
Dimension reduction is an important pre-processing step in the multivariate analysis field, improving the identification of outliers or clusters among others. The well-known Principal Component Analysis (PCA), which can be derived from the decomposition of a scatter matrix, is one of the most famous dimension reduction techniques, but it may not be the best choice for clustering purposes. Indeed, looking for clusters on the subspaces given by PCA can be too restrictive as it is only based on the variability of the data. An alternative approach is the Invariant Component Selection (ICS) multivariate method, which relies on the joint diagonalization of two scatter matrices. By comparing two scatter matrices, ICS goes beyond PCA and finds more directions of interest. In addition, for combinations of affine equivariant scatter matrices, ICS returns affine invariant components, i.e. it is not scaled dependent contrarily to PCA. Two challenging steps are the choice of the pair of scatter matrices and the selection of the components to retain. [1] derive some theoretical results to guarantee that under some elliptical mixture models, the structure of the data can be highlighted on a subset of the first and/or last components. ICS is able to recover the Fisher’s linear discriminant subspace even in the case where the group identifications are unknown. In terms of end goals, ICS has been studied in the literature for outlier detection purposes [2] but has received little attention concerning clustering tasks. The aim is to evaluate the performance of several well-known clustering algorithms on the initial data, and on its reduced version using ICS. A comparison of different scatter matrices and dimension selection methods is performed on benchmark data sets to clarify in which context ICS improves the identification of clusters.
Details
- Posted on:
- December 19, 2022
- Length:
- 2 minute read, 322 words
- See Also: