Supervised classification for structured data: applications in bio- and chemoinformatics

Jean-Philippe Vert, Mines ParisTech

The analysis of patterns summer school, Pula Science Park, Cagliari, Italy, Sep. 27 - Oct. 3, 2009.

Outline

Many problems in computational biology and chemistry can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, on complex or structued data such as graphs or high-dimensional vectors. In this lecture I discuss several recent approaches to cope with such problems, in particular the difficulty to construct explicitly expressive descriptors, the kernel trick to work implicitly with many features, and the estimation of sparse classifiers. These strategies are illustrated on different applications such as the classification of small molecules for virtual screening of drugs, and the classification of gene expression data and comparative genomic hybridization (CGH) profiles for cancer diagnosis and prognosis.

Schedule

Introduction to pattern recognition
Explicit computation of features: the case of graph features
Using kernels

Introduction to kernels
Graph kernels
Kernels for gene expression using gene networks

Using sparsity-inducing shrinkage estimators

Features selection for all subgraph indexation
Classification of array CGH with piecewise linear models
Structured gene selection for microarray classification

Slides

Vert Jean-Philippe

Last modified: Tue Jun 27 17:04:25 CEST 2006