A probabilistic syntactic parser in Tokodai

Next: Combined analysis Up: Syntactic parsing Previous: POWERan object-oriented parser

A probabilistic syntactic parser in Tokodai

Tanaka Laboratory in the Tokyo Institute of Technology (Tokodai) has generalized the method called GLR (Generalized Left-Right) to include a probability estimation for every candidate obtained from this parsing method. In case of ambiguities between several candidates, the one with the highest probability can then be preferred, and an information concerning the reliability of the result can also be obtained from this method. Besides, the speed of analysis is increased by eliminating candidates with very low probability before they are completely estimated.

This method is more precise than Probabilistic Context-Free Grammar (PCFG) which consists in estimating the probability of each grammar rule. In order to get a mildly context-sensitive model, probabilities are estimated for actions of the LR table used by the automat to parse sentences and derived from the CFG rules. The probability of a derivation is defined as the product of the probabilities of the actions that led to that derivation. Probabilities are estimated by counting frequencies of actions to parse a corpus of correctly parsed sentences, in this case 10,000 sentences from the ATR corpus.

Jean-Philippe Vert
Sun Dec 6 11:05:42 MET 1998