Kernel methods for computational biology and chemistry

Jean-Philippe Vert, Ecole des Mines de Paris

"Statistical Mathematics and Application" courses, CIRM, Luminy, November 13-17, 2006.

Slides (287 slides, 3.7Mb)

Outline

Many problems in computational biology and chemistry can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, with the caveat that the data are often not vectors. Indeed objects such as gene sequences, small molecules, protein 3D structures or phylogenetic trees, to name just a few, have particular structures which contain relevant information for the statistical problem but can hardly be encoded into finite-dimensional vector representations.

Kernel methods are a class of algorithms well suited for such problems. Indeed they extend the applicability of many statistical methods initially designed for vectors to virtually any type of data, without the need for explicit vectorization of the data. The price to pay for this extension to non-vectors is the need to define a positive definite kernel between the objects, formally equivalent to an implicit vectorization of the data. The "art" of kernel design for various objects have witnessed important advances in recent years, resulting in many state-of-the-art algorithms in computational biology and chemistry, as well as many other fields.

The goal of this course is to present the mathematical foundations of kernel methods, as well as the main approaches that have emerged so far in kernel design. The relevance of these methods will be illustrated by several examples in computational biology and chemistry.

Schedule

Kernels and RKHS

Kernels
Mercer kernels
Reproducing kernel Hilbert spaces
Smoothness functional

Kernel methods

The kernel trick
The representer theorem
Pattern recognition with large margin in RKHS
Support vector machines

Kernel examples

Mercer kernels and RKHS
RKHS and Green functions
Fourier analysis and semigroup kernels

Kernels for biological sequences

Motivations
Features space approach
Using generative models
Derive from a similarity measure
Application: remote homology detection

Kernels on graphs

Motivation
Construction by regularization
The diffusion kernel
Harmonic analysis on graphs
Application: microarray classification

References

Vert Jean-Philippe

Last modified: Wed Dec 20 17:48:33 CET 2006