next up previous contents
Next: Decision trees without dictionaries Up: Segmentation and morphological analysis Previous: JUMAN at Kyoto University

A statistical approach at the NAIST

Instead of using a fixed grammar for morphological parsing, Prof. Matsumoto's laboratory, in NAIST, has developed a statistical approach. The parser first uses a dictionary to obtain a list of all possible candidates for parsing. A cost is then associated to every bigram, depending on the frequencies measured on a corpus. The lowest cost analysed is finally considered to be the most probable one. The formalism used by the parser consists of 14 different morphological classes, divided into sub-categories (e.g. common names, country names, etc...). The bigram costs have first been estimated by hand, but a computation using EDR corpus was in development in July 1998.



Jean-Philippe Vert
Sun Dec 6 11:05:42 MET 1998