Machine learning with kernel methods, Spring 2021

Julien Mairal and Jean-Philippe Vert

MSc Mathematics, Vision, Machine Learning (MVA) (ENS Paris Saclay)

MSc Mathematics, Machine Learning, and the Humanities (MASH) (Dauphine, PSL University)

Slides

Slides are frequently updated. Please let us know if you spot typos!

Outline

Many problems in real-world applications of machine learning can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, with the caveat that the data are often not vectors of numbers. For example, protein sequences and structures in computational biology, text and XML documents in web mining, segmented pictures in image processing, or time series in speech recognition and finance, have particular structures which contain relevant information for the statistical problem but can hardly be encoded into finite-dimensional vector representations.

Kernel methods are a class of algorithms well suited for such problems. Indeed they extend the applicability of many statistical methods initially designed for vectors to virtually any type of data, without the need for explicit vectorization of the data. The price to pay for this extension to non-vectors is the need to define a so-called positive definite kernel function between the objects, formally equivalent to an implicit vectorization of the data. The "art" of kernel design for various objects have witnessed important advances in recent years, resulting in many state-of-the-art algorithms and successful applications in many domains.

The goal of this course is to present the mathematical foundations of kernel methods, as well as the main approaches that have emerged so far in kernel design. We will start with a presentation of the theory of positive definite kernels and reproducing kernel Hilbert spaces, which will allow us to introduce several kernel methods including kernel principal component analysis and support vector machines. Then we will come back to the problem of defining the kernel. We will present the main results about Mercer kernels and semigroup kernels, as well as a few examples of kernel for strings and graphs, taken from applications in computational biology, text processing and image analysis. Finally we will touch upon topics of active research, such as large-scale kernel methods and deep kernel machines.

References

N. Aronszajn, "Theory of reproducing kernels", Transactions of the American Mathematical Society, 68:337-404, 1950.
C. Berg, J.P.R. Christensen et P. Ressel, "Harmonic analysis on semi-groups", Springer, 1994.
N. Cristianini and J. Shawe-Taylor, "Kernel Methods for Pattern Analysis", Cambridge University Press, 2004.
B. Schölkopf et A. Smola, "Learning with kernels", MIT Press, 2002.
B. Schölkopf, K. Tsuda et J.-P. Vert, "Kernel methods in computational biology", MIT Press, 2004.
V. Vapnik, "Statistical Learning Theory", Wiley, 1998.

Schedule and organization

The course is taught fully remotely this year.
Every week, students should watch the lectures and study the slides before the class.
The class with teachers starts at 3pm and finishes at 4pm, on zoom (link will be shared with students). During the class, we will quickly go over the slides, answer questions, and discuss exercices.
After each class, you will have a few exercices to complete before the following class.

Date	Lecturer	Topic	Slides	Video	Exercices	More material
Jan 13	JM+JPV	Kernels, RKHS, examples	1-41	Lecture 1, Lecture 2	Homework 1	Uniqueness of the RKHS Aronszajn's theorem
Jan 20	JM	Smoothness functional, Kernel trick, Representer theorem	42-82	Lecture 3, Lecture 4, Lecture 5
Jan 27	JPV	Kernel ridge and logistic regression	83-114	Lecture 6	Homework 2 Draft solution
Feb 3	JM	Large-margin classifiers, SVMs	115-165	Lecture 7, Lecture 8		Bartlett et al. 2003
Feb 10	JPV	Unsupervised learning, kernel PCA, K-means, CCA	166-202	Lecture 9, Lecture 10	Homework 3
Feb 17		Break
Feb 24	JM	Green, Mercer, Herglotz, Bochner and friends	203-310	Lecture 11a Lecture 11b Lecture 11d		Lecture 11c
Mar 3	JPV	Kernels for graphs, kernels on graphs	436-549	Lecture 12a Lecture 12b	Homework 4
Mar 10	JPV	MKL, large-scale learning with kernels	551-621	Lecture 13a Lecture 13
Mar 17	JM	Deep kernel machines	630-714	Lecture 14a	Final Homework	Lecture 14b and Lecture 14c

Bonus video: kernels for biologial sequences Lecture 15
Bonus video: kernels for probabilistic models Lecture 16

Evaluation

The final note will be a weighted average of a data challenge (40%), a final homework (40%) and regular quizzes every week (20%).

Back to homepage