Ãëàâíûå ìíîãîîáðàçèÿ äëÿ âèçóàëèçàöèè è àíàëèçà äàííûõ

Principal Manifolds for Data Visualisation and Dimension Reduction

 

Ïî ãëàâàì (PDF files of Chapters):

 

Contents

 

Frontmatter (Preface-Contents-List of Authors)

 

1 Developments and Applications of Nonlinear

Principal Component Analysis – a Review

Uwe Kruger, Junping Zhang, Lei Xie

1.1 Introduction

1.2 PCA Preliminaries

1.3 Nonlinearity Test for PCA Models

1.3.1 Assumptions                                          

1.3.2 Disjunct Regions                                      

1.3.3 Confidence Limits for Correlation Matrix                 

1.3.4 Accuracy Bounds                                     

1.3.5 Summary of the Nonlinearity Test                       

1.3.6 Example Studies                                      

1.4 Nonlinear PCA Extensions                                   

1.4.1 Principal Curves and Manifolds                         

1.4.2 Neural Network Approaches                            

1.4.3 Kernel PCA                                          

1.5 Analysis of Existing Work                                    

1.5.1 Computational Issues                                  

1.5.2 Generalization of Linear PCA?                          

1.5. Roadmap for Future Developments (Basics and Beyond)   

1.6 Concluding Summary                                        

References                                                      

 

2 Nonlinear Principal Component Analysis:

Neural Network Models and Applications

Matthias Scholz, Martin Fraunholz, Joachim Selbig                   

2.1 Introduction                                                

2.2 Standard Nonlinear PCA                                     

2.3 Hierarchical Nonlinear PCA                                  

2.3.1 The Hierarchical Error Function                        

2.4 Circular PCA                                               

2.5 Inverse Model of Nonlinear PCA                              

2.5.1 The Inverse Network Model                            

2.5.2 NLPCA Models Applied to Circular Data                

2.5.3 Inverse NLPCA for Missing Data                        

2.5.4 Missing Data Estimation                               

2.6 Applications                                                

2.6.1 Application of Hierarchical NLPCA                     

2.6.2 Metabolite Data Analysis                              

2.6.3 Gene Expression Analysis                              

2.7 Summary                                                  

References                                                      

 

3 Learning Nonlinear Principal Manifolds

by Self-Organising Maps

Hujun Yin                                                      

3.1 Introduction                                                

3.2 Biological Background                                        

3.2.1 Lateral Inhibition and Hebbian Learning                 

3.2.2 From Von Marsburg and Willshaw’s Mode

to Kohonen’s SOM                                    

3.2.3 The SOM Algorithm                                  

3.3 Theories                                                    

3.3.1 Convergence and Cost Functions                        

3.3.2 Topological Ordering Measures                          

3.4 SOMs, Multidimensional Scaling

and Principal Manifolds                                       

3.4.1 Multidimensional Scaling                               

3.4.2 Principal Manifolds                                    

3.4.3 Visualisation Induced SOM (ViSOM)                    

3.5 Examples                                                   

3.5.1 Data Visualisation                                    

3.5.2 Document Organisation and Content Management         

References                                                      

 

4 Elastic Maps and Nets for Approximating

Principal Manifolds and Their Application

to Microarray Data Visualization

Alexander N Gorban, Andrei Y Zinovyev                          

4.1 Introduction and Overview                                   

4.1.1 Fr´echet Mean and Principal Objects

K-Means, PCA, what else?                             

4.1.2 Principal Manifolds                                    

4.1.3 Elastic Functional and Elastic Nets                      

4.2 Optimization of Elastic Nets for Data Approximation            

4.2.1 Basic Optimization Algorithm                          

4.2.2 Missing Data Values                                   

4.2.3 Adaptive Strategies                                     

4.3 Elastic Maps                                                

4.3.1 Piecewise Linear Manifolds and Data Projectors           

4.3.2 Iterative Data Approximation                          

4.4 Principal Manifold as Elastic Membrane                        

4.5 Method Implementation                                      

4.6 Examples                                                  

4.6.1 Test Examples                                        

4.6.2 Modeling Molecular Surfaces                            

4.6.3 Visualization of Microarray Data                        

4.7 Discussion                                                  

References                                                      

 

5 Topology-Preserving Mappings for Data Visualisation

Marian Pe¯na, Wesam Barbakh, Colin Fyfe                         

5.1 Introduction                                                

5.2 Clustering Techniques                                         

5.2.1 K-Means                                             

5.2.2 K-Harmonic Means                                    

5.2.3 Neural Gas                                           

5.2.4 Weighted K-Means                                     

5.2.5 The Inverse Weighted K-Means                         

5.3 Topology Preserving Mappings                                

5.3.1 Generative Topographic Map                           

5.3.2 Topographic Product of Experts ToPoE                  

5.3.3 The Harmonic Topograpic Map                         

5..3.4 Topographic Neural Gas                               

5.3.5 Inverse-Weighted K-Means Topology-Preserving Map     

5.4 Experiments                                                

5.4.1 Projections in Latent Space                            

5.4.2 Responsibilities                                       

5.4.3 U-matrix, Hit Histograms and Distance Matrix           

5.4.4 The Quality of The Map                               

5.5 Conclusions                                                 

References                                                      

 

6 The Iterative Extraction Approach to Clustering

Boris Mirkin                                                     

6.1 Introduction                                                

6.2 Clustering Entity-to-feature Data                             

6.2.1 Principal Component Analysis                          

6.2.2 Additive Clustering Model and ITEX                    

6.2.3 Overlapping and Fuzzy Clustering Case                  

6.2.4 K-Means and iK-Means Clustering                      

6.3 ITEX Structuring and Clustering for Similarity Data            

6.3.1 Similarity Clustering: a Review                         

6.3.2 The Additive Structuring Model and ITEX               

6.3.3 Additive Clustering Model                              

6.3.4 Approximate Partitioning                              

6.3.5 One Cluster Clustering                                 

6.3.6 Some Applications                                    

References                                                      

 

7 Representing Complex Data Using Localized Principal

Components with Application to Astronomical Data

Jochen Einbeck, Ludger Evers, Coryn Bailer-Jones                   

7.1 Introduction                                                

7.2 Localized Principal Component Analysis                       

7.2.1 Cluster-wise PCA                                     

7.2.2 Principal Curves                                      

7.2.3 Further Approaches                                   

7.3 Combining Principal Curves and Regression                    

7.3.1 Principal Component Regression and its Shortcomings     

7.3.2 The Generalization to Principal Curves                  

7.3.3 Using Directions Other than the Local Principal Components    

7.3.4 A Simple Example                                    

7.4 Application to the Gaia Survey Mission                        

7.4.1 The Astrophysical Data                                

7.4.2 Principal Manifold Based Approach                     

7.5 Conclusion                                                 

References                                                      

 

8 Auto-Associative Models, Nonlinear Principal Component

Analysis, Manifolds and Projection Pursuit

St´ephane Girard, Serge Iovleff                                     

8.1 Introduction                                                

8.2 Auto-Associative Models                                      

8.2.1 Approximation by Manifolds                            

8.2.2 A Projection Pursuit Algorithm                        

8.2.3 Theoretical Results                                    

8.3 Examples                                                   

8.3.1 Linear Auto-Associative Models and PCA                

8.3.2 Additive Auto-Associative Models and Neural Networks    

8.4 Implementation Aspects                                      

8.4.1 Estimation of the Regression Functions                   

8.4.2 Computation of Principal Directions                     

8.5 Illustration on Real and Simulated Data                       

References                                                      

 

9 Beyond The Concept of Manifolds: Principal Trees,

Metro Maps, and Elastic Cubic Complexes

Alexander N Gorban, Neil R Sumner, Andrei Y Zinovyev           

9.1 Introduction and Overview                                    

9.1.1 Elastic Principal Graphs                               

9.2 Optimization of Elastic Graphs

for Data Approximation                                      

9.2.1 Elastic Functional Optimization                         

9.2.2 Optimal Application of Graph Grammars                

9.2.3 Factorization and Transformation of Factors              

9.3 Principal Trees (Branching Principal Curves)                   

9.3.1 Simple Graph Grammar (“Add a Node”, “Bisect an Edge”)

9.3.2 Visualization of Data Using “Metro Map” Two-Dimensional Tree Layout              

9.3.3 Example of Principal Cubic Complex: Product of Principal Trees                             

9.4 Analysis of the Universal 7-Cluster Structure

of Bacterial Genomes                                         

9.4.1 Brief Introduction                                     

9.4.2 Visualization of the 7-Cluster Structure                  

9.5 Visualization of Microarray Data                              

9.5.1 Dataset Used                                          

9.5.2 Principal Tree of Human Tissues                        

9.6 Discussion                                                  

References                                                      

 

10 Diffusion Maps - a Probabilistic Interpretation

for Spectral Embedding and Clustering Algorithms

Boaz Nadler, Stephane Lafon, Ronald Coifman, Ioannis G Kevrekidis  

10.1 Introduction                                                

10.2 Diffusion Distances and Diffusion Maps                        

10.2.1 Asymptotics of the Diffusion Map                       

10.3 Spectral Embedding of Low Dimensional Manifolds              

10.4 Spectral Clustering of a Mixture of Gaussians                   

10.5 Summary and Discussion                                     

References                                                      

 

11 On Bounds for Diffusion, Discrepancy

and Fill Distance Metrics

Steven B Damelin                                               

11.1 Introduction                                                 

11.2 Energy, Discrepancy, Distance

and Integration on Measurable Sets in Euclidean Space          

11.3 Set Learning via Normalized Laplacian

Dimension Reduction and Diffusion Distance                   

11.4 Main Result: Bounds for Discrepancy,

Diffusion and Fill Distance Metrics                            

References                                                      

 

12 Geometric Optimization Methods for the Analysis

of Gene Expression Data

Michel Journ´ee, Andrew E Teschendorff, Pierre-Antoine Absil,

Simon Tavar´e, Rodolphe Sepulchre                                 

12.1 Introduction                                                 

12.2 ICA as a Geometric Optimization Problem                     

12.3 Contrast Functions                                          

12.3.1 Mutual Information [8, 10]                             

12.3.2 F-Correlation [14]                                      

12.3.3 Non-Gaussianity [17]                                  

12.3.4 Joint Diagonalization of Cumulant Matrices [19]           

12.4 Matrix Manifolds for ICA                                    

12.5 Optimization Algorithms                                      

12.5.1 Line-Search Algorithms                                

12.5.2 FastICA                                             

12.5.3 Jacobi Rotations                                      

12.6 Analysis of Gene Expression Data by ICA                       

12.6.1 Some Issues About the Application of ICA               

12.6.2 Evaluation of the Biological Relevance

of the Expression Modes                               

12.6.3 Results Obtained on the Breast Cancer

Microarray Data Set                                    

12.7 Conclusion                                                 

References                                                      

 

13 Dimensionality Reduction and Microarray data

David A Elizondo, Benjamin N Passow, Ralph Birkenhead,

Andreas Huemer                                                 

13.1 Introduction                                                

13.2 Background                                                

13.2.1 Microarray Data                                      

13.2.2 Methods for Dimension Reduction                       

13.2.3 Linear Separability                                    

13.3 Comparison Procedure                                       

13.3.1 Data Sets                                            

13.3.2 Dimensionality Reduction                              

13.3.3 Perceptron Models                                    

13.4 Results                                                    

13.5 Conclusions                                                 

References                                                      

 

14 PCA and K-Means Decipher Genome

Alexander N Gorban, Andrei Y Zinovyev                           

14.1 Introduction                                                

14.2 Required Materials                                          

14.3 Genomic Sequence                                           

14.3.1 Background                                           

14.3.2 Sequences for the Analysis                             

14.4 Converting Text to a Numerical Table                         

14.5 Data Visualization                                          

14.5.1 Visualization                                          

14.5.2 Understanding Plots                                   

14.6 Clustering and Visualizing Results                             

14.7 Task List and Further Information                            

14.8 Conclusion                                                  

References 

 

Index