Kernel Regularization and Dimension Reduction
High-dimensional data, where each object (e.g., a patient or a
protein) is associated with a large feature vector, are becoming
increasingly available in many fields of science and technology.
It is often possible to use expert knowledge or other sources of
information to obtain dissimilarity measures for pairs of objects,
which serve as pseudo-distances between the objects. When
dissimilarity information is available, there are two types of
problems of interest. The first is to estimate full position
configuration for all objects in a low dimensional space while
respecting all the available information. This is usually for the
purposes of visualizing the data and/or conducting further
analysis, such as clustering or classification. Multidimensional
scaling (MDS), which is still an active research area, has been
traditionally used to achieve this goal. In the second type of
problems, the high dimensional data points are assumed to lie on a low dimensional manifold and the goal is to unfold the manifold in order to recover the underlying intrinsic low dimensional structure. Various methods with interesting
applications for these type of problems were developed only recently.
We provide a novel, unified framework called Kernel Regularization to optimally solve both types of problems. Advanced optimization techniques are utilized to obtain the global solutions accurately and efficiently. The proposed method can naturally accommodate the dissimilarity information with possibly crude, noisy, incomplete, inconsistent and weighted observations. I will illustrate various favorable operating characteristics and properties of the method using both simulated and real data sets including face images, brain fluid diffusion patterns, and proteins sequences.
This is a joint work with Grace Wahba, Yi Lin, Sunduz Keles
from the Department of Statistics and Stephen J. Wright from the Department of Computer Sciences. One central publication is available at http://www.pnas.org/cgi/content/short/102/35/12332