Supervised classification for structured data: applications in bio- and chemoinformatics

Jean-Philippe Vert, Mines ParisTech

The analysis of patterns summer school, Pula Science Park, Cagliari, Italy, Sep. 27 - Oct. 3, 2009.

Outline

Many problems in computational biology and chemistry can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, on complex or structued data such as graphs or high-dimensional vectors. In this lecture I discuss several recent approaches to cope with such problems, in particular the difficulty to construct explicitly expressive descriptors, the kernel trick to work implicitly with many features, and the estimation of sparse classifiers. These strategies are illustrated on different applications such as the classification of small molecules for virtual screening of drugs, and the classification of gene expression data and comparative genomic hybridization (CGH) profiles for cancer diagnosis and prognosis.

Schedule

  1. Introduction to pattern recognition
  2. Explicit computation of features: the case of graph features
  3. Using kernels
    1. Introduction to kernels
    2. Graph kernels
    3. Kernels for gene expression using gene networks
  4. Using sparsity-inducing shrinkage estimators
    1. Features selection for all subgraph indexation
    2. Classification of array CGH with piecewise linear models
    3. Structured gene selection for microarray classification

Slides


Vert Jean-Philippe
Last modified: Tue Jun 27 17:04:25 CEST 2006