Seminar "Introduction to Support Vector Machines (SVM) and applications to bioinformatics" (June-November 2001)Contact : Jean-Philippe VertThis seminar is over now... Feel free to use the informations I collected in this page and don't hesitate to contact me for any question |
---|
Tentative schedule
When? Where? Who? What? 6/6 (wed.), 13:30 Seminar C-529 JP Introduction to SVM (1) : First contact 6/13 (wed.), 13:30 Seminar C-529 JP Introduction to SVM (2): simplest SVM (linear, separable) 6/20 (wed.), 10:00 Seminar C-529 JP Optimization theory (1) 6/27 (wed.), 13:30 Seminar C-529 JP Optimization theory (2) 7/4 (wed.), 13:30 Seminar C-529 JP Linear SVM (end) 7/11 (wed.), 13:30 Seminar C-529 JP Non-separable training set 7/18 (wed.), 13:30 Seminar C-529 JP Non-linear SVM with kernels 7/25 (wed.), 13:30 Seminar C-529 Doug, Park, (Kawashima) Gene classification from microarray expression data 9/5 (wed.), 13:30 Seminar C-529 Saigo, Itoh Implementation techniques 9/12 (wed.), 13:30 Seminar C-529 Kawashima, Rikuhiro, Okuda Tissue classification from microarray expression data 9/19 (wed.), 13:30 Seminar C-529 Okuno Translation initiation site recognition in DNA 9/26 (wed.), 13:30 Seminar C-529 Hattori Protein Fold Recognition 10/2 (tue.) Seminar C-529 Tsuda and Arita (CBRC) "Fisher Kernel and beyond" (Tsuda), "Small world" (Arita) 10/10 (wed.), 13:30 Seminar C-529 Igarashi Protein-protein interactions 10/17 (wed.), 13:30 Seminar C-529 Yamamura Protein secondary structure prediction 10/24 (wed.), 13:30 Seminar C-529 Park Protein subcellular localization prediction 10/31 (wed.), 13:30 Seminar C-529 JP New kernels for strings and graphs 11/07 (wed.), 13:30 Seminar C-529 All SVM project 11/14 (wed.), 13:30 Seminar C-529 JP Fisher kernel Tentative seminar notes
You can download the seminar notes in PS or PDF format. These notes are incomplete and subject to frequent updates (check the date on the first page!). Comments are welcome;-)
Overview
Support vector machines (SVMs) is a new generation of learning system. It is based on strong mathematical fundations (the statistical learning theory developped by Vladimir Vapnik since the 70's) and results in simple yet very powerful algorithms. SVMs deliver the state-of-the-art performance in real-world applications such as hand-written character recognition, text categorization, image classification or biosequences classification.
Application of SVM techniques to computational biology is recent (most papers were published since 2000) but very promising, as compared to "traditional" methods. Current applications include:
and more applications are expected to be developped in the near future.
- recognition of translation initiation sites in DNA;
- promoter-region based classification of genes;
- proteins classification;
- protein fold recognition;
- microarray data analysis,
The goal of this seminar is to provide an introduction to support vector machines and to review recent applications in the field of computational biology.
It will be as self-contained as possible, and no strong mathematical background is required. It is open to all students and researchers of Kanehisa laboratory.
Organization
A gentle yet complete introduction to support vector machines will first be given by JP Vert. Then participants to the seminar will present recent papers involving SVMs in the bioinformatics litterature. The seminar will then focus on particular applications and will be the starting point of new research directions.
Material to learn about SVMs
- A Tutorial on Support Vector Machines for Pattern Recognition, Christopher J. C. Burges, Data Mining and Knowledge Discovery, 2 (2), 121:167, 1998
[http://joker.mil.ufl.edu/people/nechyba/lecture-notes/eel6668-f00/burges_svm_tutorial.pdf]- A Tutorial on Support Vector Regression, Alex J. Smola and Bernhard Schoelkopf, NeuroCOLT Technical Report NC-TR-98-030
[http://www.neurocolt.com/tech_reps/1998/98030.ps.gz- Support Vector Learning, Bernhard Scholkopf, PhD Thesis, Published by: R. Oldenbourg Verlag, Munich, 1997
[http://svm.first.gmd.de/papers/book_ref.ps.gz]Applications to bioinformatics
- Gene classifications from microarray expression data
- Cluster Analysis and Display of Genome-Wide Expression Patterns, Eisen MB, Spellman PT, Brown PO and Botstein D, Proc Natl Acad Sci U S A 95, 14863-8, 1998
[http://rana.lbl.gov/papers/Eisen_PNAS_1998.pdf- Systematic determination of genetic network architecture., Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM, Nat Genet 1999 Jul;22(3):281-5
[http://www.nature.com/cgi-taf/DynaPage.taf?file=/ng/journal/v22/n3/full/ng0799_281.html&filetype=pdf]- Knowledge-based analysis of microarray gene expression data by using support vector machines, Michael P. S. Brown, William Noble Grundy, David Lin, Nello Cristianini, Charles Walsh Sugnet, Terence S. Furey, Manuel Ares, Jr., David Haussler, Proc. Natl. Acad. Sci. USA, vol. 97, pages 262-267
[http://www.pnas.org/cgi/reprint/97/1/262.pdf]- Support Vector Machine Classification of Microarray Gene Expression Data, Michael P. S. Brown William Noble Grundy, David Lin, Nello Cristianini, Charles Sugnet, Manuel Ares, Jr., David Haussler
[http://www.cse.ucsc.edu/research/compbio/genex/genex.ps]- Gene functional classification from heterogeneous data Paul Pavlidis, Jason Weston, Jinsong Cai and William Noble Grundy, Proceedings of RECOMB 2001
[http://www.cs.columbia.edu/compbio/exp-phylo/exp-phylo.pdf]- Tissue classification from microarray expression data
- Support vector machine classification of microarray data, S. Mukherjee, P. Tamayo, J.P. Mesirov, D. Slonim, A. Verri, and T. Poggio, Technical Report 182, AI Memo 1676, CBCL, 1999.
[http://www.kernel-machines.org/papers/upload_19790_cancer.ps]- Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data, Terrence S. Furey, Nigel Duffy, Nello Cristianini, David Bednarski, Michel Schummer, and David Haussler, Bioinformatics. 2000, 16(10):906-914.
[[http://bioinformatics.oupjournals.org/cgi/reprint/16/10/906.pdf]- Gene Selection for Cancer Classification using Support Vector Machines, I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Machine Learning, 2001
[http://homepages.nyu.edu/~jaw281/genesel.pdf]- Translation initiation site recognition in DNA
- Engineering support vector machine kernels that recognize translation initiation sites, A. Zien, G. Ratsch, S. Mika, B. Scholkopf, T. Lengauer, and K.-R. Muller, BioInformatics, 16(9):799-807, 2000.
[http://bioinformatics.oupjournals.org/cgi/reprint/16/9/799.pdf]- Protein fold recognition
- Multi-class protein fold recognition using support vector machines and neural networks, Chris Ding and Inna Dubchak, Bioinformatics, 17:349-358, 2001
[http://www.kernel-machines.org/papers/upload_4192_bioinfo.ps]- Protein-protein interactions
- Predicting protein-protein interactions from primary structure w, Joel R. Bock and David A. Gough, Bioinformatics 2001 17: 455-460
[http://bioinformatics.oupjournals.org/cgi/reprint/17/5/455.pdf]- Protein secondary structure prediction
- A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach, Sujun Hua and Zhirong Sun, Journal of Molecular Biology, vol. 308 n.2, pages 397-407, April 2001.
- Protein subcellular localization prediction
- Support vector machine approach for protein subcellular localization prediction, Sujun Hua and Zhirong Sun, Bioinformatics, vol. 17 no. 8 , pages 721-728, 2001.
[http://bioinformatics.oupjournals.org/cgi/reprint/17/8/721.pdf]- Combining HMM and SVM : the Fisher Kernel for protein classification
- Exploiting generative models in discriminative classifiers, T. Jaakkola and D. Haussler, Preprint, Dept. of Computer Science, Univ. of California, 1998
[http://www.cse.ucsc.edu/research/ml/papers/Jaakola.ps]- A discrimitive framework for detecting remote protein homologies, T. Jaakkola, M. Diekhans, and D. Haussler, Journal of Computational Biology, Vol. 7 No. 1,2 pp. 95-114, (2000)
[http://www.cse.ucsc.edu/research/compbio/discriminative/Jaakola2-1998.ps]- Classifying G-Protein Coupled Receptors with Support Vector Machines, Rachel Karchin, Master's Thesis, June 2000
[http://www.cse.ucsc.edu//research/compbio/papers/gpcr_svm.ps.gz]- The Fisher Kernel for classification of genes
- Promoter region-based classification of genes, Paul Pavlidis, Terrence S. Furey, Muriel Liberto, David Haussler and William Noble Grundy, Proceedings of the Pacific Symposium on Biocomputing, January 3-7, 2001. pp. 151-163.
[http://www.cs.columbia.edu/~bgrundy/papers/prom-svm.pdf]- New kernels for strings
- Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings, Jean-Philippe Vert, Proceedings of the Pacific Symposium on Biocomputing, January 2-7, 2002. To appear.
[ps.gz][pdf][web supplement]Useful links
- http://www.kernel-machines.org : an entry to the SVM universe
- a SVM applet at Lucent Technologies
- mySVM, a SVM implementation in C++
- TinySVM, an other implementation of SVM by Taku Kudoh, with Perl/Ruby/Python/Java modules
- Computational biology group at Columbia University
- UCSC Bioinformatics group
- Biowulf Technologies, a company specialized in biomedical applications of SVM
- A tutorial about SVM in bioinformatics to be given by Nello Cristianini at PSB 2002
Back to JP's homepage