The use of probabilistic models to produce optimal graphical displays of high-dimensional data sets |
| |
Authors: | Henri Caussinus |
| |
Institution: | (1) Laboratoire de Statistique et Probabilites, Université Paul Sabatier, 118 route de Narbonne, 31062 Toulouse Cedex, France |
| |
Abstract: | Summary Several techniques for exploring ann×p data set are considered in the light of the statistical framework: data-structure+noise. The first application is to Principal
Component Analysis (PCA), in fact generalized PCA with any metric M on the unit space ℝ
p
. A natural model for supporting this analysis is the fixed-effect model where the expectation of each unit is assumed to
belong to some q-dimensional linear manyfold defining the structure, while the variance describes the noise. The best estimation
of the structure is obtained for a proper choice of metric M and dimensionality q: guidelines are provided for both choices
in section 2. The second application is to Projection Pursuit which aims to reveal structure in the original data by means
of suitable low-dimensional projections of them. We suggest the use of generalized PCA with suitable metric M as a Projection
Pursuit technique. According to the kind of structure which is looked for, two such metrics are proposed in section 3. Finally,
the analysis ofn×p contingency tables is considered in section 4. Since the data are frequencies, we assume a multinomial or Poisson model for
the noise. Several models may be considered for the structural part; we can say that Correspondence Analysis rests on one
of them, spherical factor analysis on another one; Goodman association models also provide an alternative modelling. These
different approaches are discussed and compared from several points of view. |
| |
Keywords: | Contingency tables Exploratory data analysis Graphical displays Principal Component Analysis Projection pursuit |
本文献已被 SpringerLink 等数据库收录! |
|