next up previous contents
Next: Semantic analysis and representation Up: Combined analysis Previous: Combined analysis

A morpho-syntactic parser based on LR algorithm in Tokodai

The research team of Professor Tanaka in the Tokyo Institute of Technology proposed a method to combine morphological and syntactic analysis in one stage and to keep distinct morphological and syntactic rules. That point is very important, as much research has already been done in these two fields that led to efficient models.

Classical morphological analysis uses a dictionary that specifies the morphological category mcat of every word or series of characters, and a connectivity matrix in order to enable or not a sequence of mcat. Using this approach alone usually leads to many ambiguities.

To overcome that problem Professor Tanaka's team combines this classical approach with syntactic LR parsing that derives a LR matrix used by an automat from CFG rules. This is done by adding to the CFG rules (that concern syntactic categories cat) a set of rules to link mcat with cat thanks to the dictionary that gives for each word its cat and mcat. One cat is usually associated to several mcat. The grammar is automatically augmented and is then considered as a CFG from which a LR table can be derived, based on a set of mcat and cat.

The connectivity matrix is then used to automatically delete illegal reduction actions, in order to obtain a modified LR table that includes morphological constraints. A modified LR parsing algorithm can finally be applied to sentences in order to build a tree that sums up the morphological and syntactic structures.



Jean-Philippe Vert
Sun Dec 6 11:05:42 MET 1998