Modern technologies like DNA microarrays or high-throughput sequencing are revolutionising biology and medical research. By allowing the collection of large amounts of measures at the molecular level on living organisms, they pave the way to a quantitative and rationale analysis of biological systems. Unsurprisingly, statistics and machine learning play an important role in this revolution. By processing large collections of datasets, they allow to extract new biological knowledge and infer predictive models.
The goal of this course is to present a few modern statistical learning techniques, and to touch upon a selected panel of applications in computational and systems biology. We will study in particular support vector machines (SVM) and kernels, as well as feature selection techniques including lasso regression. Applications include protein annotation, virtual screening in drug design, prognostic and predictive models for personalised medicine in oncology, and gene network inference in systems biology.
When | What | Slides |
Friday, Feb 3, 4:30pm-6:30pm | Introduction, learning in high dimension, ridge regression | 1-48 |
Friday, Feb 10, 4:30pm-7:30pm | Logistic regression, linear SVM | 49-98 |
Friday, Feb 24, 4:30pm-7:30pm | Large-margin classifiers, kernels | 99-141 |
Friday, Mar 3, 4:30pm-7:30pm | Cancelled | |
Friday, Mar 10, 4:30pm-7:30pm | Kernels | |
Friday, Mar 17, 4:30pm-7:30pm |
Please send report and prediction to jean-philippe.vert@mines-paristech.fr before May 10, 2017. The predictions will be scored in terms of area under the ROC curve (AUC).
Although discussions are allowed and encouraged, each student must work on the challenge individually and submit a report and a prediction.