Principal Manifolds for Data Visualisation and Dimension Reduction


(PDF files of Chapters):




Frontmatter (Preface-Contents-List of Authors)


1 Developments and Applications of Nonlinear

Principal Component Analysis a Review

Uwe Kruger, Junping Zhang, Lei Xie

1.1 Introduction

1.2 PCA Preliminaries

1.3 Nonlinearity Test for PCA Models

1.3.1 Assumptions

1.3.2 Disjunct Regions

1.3.3 Confidence Limits for Correlation Matrix

1.3.4 Accuracy Bounds

1.3.5 Summary of the Nonlinearity Test

1.3.6 Example Studies

1.4 Nonlinear PCA Extensions

1.4.1 Principal Curves and Manifolds

1.4.2 Neural Network Approaches

1.4.3 Kernel PCA

1.5 Analysis of Existing Work

1.5.1 Computational Issues

1.5.2 Generalization of Linear PCA?

1.5. Roadmap for Future Developments (Basics and Beyond)

1.6 Concluding Summary



2 Nonlinear Principal Component Analysis:

Neural Network Models and Applications

Matthias Scholz, Martin Fraunholz, Joachim Selbig

2.1 Introduction

2.2 Standard Nonlinear PCA

2.3 Hierarchical Nonlinear PCA

2.3.1 The Hierarchical Error Function

2.4 Circular PCA

2.5 Inverse Model of Nonlinear PCA

2.5.1 The Inverse Network Model

2.5.2 NLPCA Models Applied to Circular Data

2.5.3 Inverse NLPCA for Missing Data

2.5.4 Missing Data Estimation

2.6 Applications

2.6.1 Application of Hierarchical NLPCA

2.6.2 Metabolite Data Analysis

2.6.3 Gene Expression Analysis

2.7 Summary



3 Learning Nonlinear Principal Manifolds

by Self-Organising Maps

Hujun Yin

3.1 Introduction

3.2 Biological Background

3.2.1 Lateral Inhibition and Hebbian Learning

3.2.2 From Von Marsburg and Willshaws Mode

to Kohonens SOM

3.2.3 The SOM Algorithm

3.3 Theories

3.3.1 Convergence and Cost Functions

3.3.2 Topological Ordering Measures

3.4 SOMs, Multidimensional Scaling

and Principal Manifolds

3.4.1 Multidimensional Scaling

3.4.2 Principal Manifolds

3.4.3 Visualisation Induced SOM (ViSOM)

3.5 Examples

3.5.1 Data Visualisation

3.5.2 Document Organisation and Content Management



4 Elastic Maps and Nets for Approximating

Principal Manifolds and Their Application

to Microarray Data Visualization

Alexander N Gorban, Andrei Y Zinovyev

4.1 Introduction and Overview

4.1.1 Frechet Mean and Principal Objects

K-Means, PCA, what else?

4.1.2 Principal Manifolds

4.1.3 Elastic Functional and Elastic Nets

4.2 Optimization of Elastic Nets for Data Approximation

4.2.1 Basic Optimization Algorithm

4.2.2 Missing Data Values

4.2.3 Adaptive Strategies

4.3 Elastic Maps

4.3.1 Piecewise Linear Manifolds and Data Projectors

4.3.2 Iterative Data Approximation

4.4 Principal Manifold as Elastic Membrane

4.5 Method Implementation

4.6 Examples

4.6.1 Test Examples

4.6.2 Modeling Molecular Surfaces

4.6.3 Visualization of Microarray Data

4.7 Discussion



5 Topology-Preserving Mappings for Data Visualisation

Marian Pena, Wesam Barbakh, Colin Fyfe

5.1 Introduction

5.2 Clustering Techniques

5.2.1 K-Means

5.2.2 K-Harmonic Means

5.2.3 Neural Gas

5.2.4 Weighted K-Means

5.2.5 The Inverse Weighted K-Means

5.3 Topology Preserving Mappings

5.3.1 Generative Topographic Map

5.3.2 Topographic Product of Experts ToPoE

5.3.3 The Harmonic Topograpic Map

5..3.4 Topographic Neural Gas

5.3.5 Inverse-Weighted K-Means Topology-Preserving Map

5.4 Experiments

5.4.1 Projections in Latent Space

5.4.2 Responsibilities

5.4.3 U-matrix, Hit Histograms and Distance Matrix

5.4.4 The Quality of The Map

5.5 Conclusions



6 The Iterative Extraction Approach to Clustering

Boris Mirkin

6.1 Introduction

6.2 Clustering Entity-to-feature Data

6.2.1 Principal Component Analysis

6.2.2 Additive Clustering Model and ITEX

6.2.3 Overlapping and Fuzzy Clustering Case

6.2.4 K-Means and iK-Means Clustering

6.3 ITEX Structuring and Clustering for Similarity Data

6.3.1 Similarity Clustering: a Review

6.3.2 The Additive Structuring Model and ITEX

6.3.3 Additive Clustering Model

6.3.4 Approximate Partitioning

6.3.5 One Cluster Clustering

6.3.6 Some Applications



7 Representing Complex Data Using Localized Principal

Components with Application to Astronomical Data

Jochen Einbeck, Ludger Evers, Coryn Bailer-Jones

7.1 Introduction

7.2 Localized Principal Component Analysis

7.2.1 Cluster-wise PCA

7.2.2 Principal Curves

7.2.3 Further Approaches

7.3 Combining Principal Curves and Regression

7.3.1 Principal Component Regression and its Shortcomings

7.3.2 The Generalization to Principal Curves

7.3.3 Using Directions Other than the Local Principal Components

7.3.4 A Simple Example

7.4 Application to the Gaia Survey Mission

7.4.1 The Astrophysical Data

7.4.2 Principal Manifold Based Approach

7.5 Conclusion



8 Auto-Associative Models, Nonlinear Principal Component

Analysis, Manifolds and Projection Pursuit

Stephane Girard, Serge Iovleff

8.1 Introduction

8.2 Auto-Associative Models

8.2.1 Approximation by Manifolds

8.2.2 A Projection Pursuit Algorithm

8.2.3 Theoretical Results

8.3 Examples

8.3.1 Linear Auto-Associative Models and PCA

8.3.2 Additive Auto-Associative Models and Neural Networks

8.4 Implementation Aspects

8.4.1 Estimation of the Regression Functions

8.4.2 Computation of Principal Directions

8.5 Illustration on Real and Simulated Data



9 Beyond The Concept of Manifolds: Principal Trees,

Metro Maps, and Elastic Cubic Complexes

Alexander N Gorban, Neil R Sumner, Andrei Y Zinovyev

9.1 Introduction and Overview

9.1.1 Elastic Principal Graphs

9.2 Optimization of Elastic Graphs

for Data Approximation

9.2.1 Elastic Functional Optimization

9.2.2 Optimal Application of Graph Grammars

9.2.3 Factorization and Transformation of Factors

9.3 Principal Trees (Branching Principal Curves)

9.3.1 Simple Graph Grammar (Add a Node, Bisect an Edge)

9.3.2 Visualization of Data Using Metro Map Two-Dimensional Tree Layout

9.3.3 Example of Principal Cubic Complex: Product of Principal Trees

9.4 Analysis of the Universal 7-Cluster Structure

of Bacterial Genomes

9.4.1 Brief Introduction

9.4.2 Visualization of the 7-Cluster Structure

9.5 Visualization of Microarray Data

9.5.1 Dataset Used

9.5.2 Principal Tree of Human Tissues

9.6 Discussion



10 Diffusion Maps - a Probabilistic Interpretation

for Spectral Embedding and Clustering Algorithms

Boaz Nadler, Stephane Lafon, Ronald Coifman, Ioannis G Kevrekidis

10.1 Introduction

10.2 Diffusion Distances and Diffusion Maps

10.2.1 Asymptotics of the Diffusion Map

10.3 Spectral Embedding of Low Dimensional Manifolds

10.4 Spectral Clustering of a Mixture of Gaussians

10.5 Summary and Discussion



11 On Bounds for Diffusion, Discrepancy

and Fill Distance Metrics

Steven B Damelin

11.1 Introduction

11.2 Energy, Discrepancy, Distance

and Integration on Measurable Sets in Euclidean Space

11.3 Set Learning via Normalized Laplacian

Dimension Reduction and Diffusion Distance

11.4 Main Result: Bounds for Discrepancy,

Diffusion and Fill Distance Metrics



12 Geometric Optimization Methods for the Analysis

of Gene Expression Data

Michel Journee, Andrew E Teschendorff, Pierre-Antoine Absil,

Simon Tavare, Rodolphe Sepulchre

12.1 Introduction

12.2 ICA as a Geometric Optimization Problem

12.3 Contrast Functions

12.3.1 Mutual Information [8, 10]

12.3.2 F-Correlation [14]

12.3.3 Non-Gaussianity [17]

12.3.4 Joint Diagonalization of Cumulant Matrices [19]

12.4 Matrix Manifolds for ICA

12.5 Optimization Algorithms

12.5.1 Line-Search Algorithms

12.5.2 FastICA

12.5.3 Jacobi Rotations

12.6 Analysis of Gene Expression Data by ICA

12.6.1 Some Issues About the Application of ICA

12.6.2 Evaluation of the Biological Relevance

of the Expression Modes

12.6.3 Results Obtained on the Breast Cancer

Microarray Data Set

12.7 Conclusion



13 Dimensionality Reduction and Microarray data

David A Elizondo, Benjamin N Passow, Ralph Birkenhead,

Andreas Huemer

13.1 Introduction

13.2 Background

13.2.1 Microarray Data

13.2.2 Methods for Dimension Reduction

13.2.3 Linear Separability

13.3 Comparison Procedure

13.3.1 Data Sets

13.3.2 Dimensionality Reduction

13.3.3 Perceptron Models

13.4 Results

13.5 Conclusions



14 PCA and K-Means Decipher Genome

Alexander N Gorban, Andrei Y Zinovyev

14.1 Introduction

14.2 Required Materials

14.3 Genomic Sequence

14.3.1 Background

14.3.2 Sequences for the Analysis

14.4 Converting Text to a Numerical Table

14.5 Data Visualization

14.5.1 Visualization

14.5.2 Understanding Plots

14.6 Clustering and Visualizing Results

14.7 Task List and Further Information

14.8 Conclusion