next up previous contents
Next: ATR's resources Up: Linguistic resources Previous: NTT's productions

Kyoto University corpus project

A corpus project was under development at Kyoto University, whose goal was to create the corpus semi-automatically, to provide grammatically parsed sentences and to improve the automatic parsers at the same time.

The corpus contained 20,000 sentences in July 1998. Sentences were automatically parsed with JUMAN for the morphology and KNP for the syntax. Every sentence was then checked and eventually modified by humans, and the errors are used to improve the parsing algorithms used. The rate of growth of the corpus was of about 40 sentences per hour and per person.

This corpus can be downloaded through Internet on Kyoto University web site, but it is necessary to buy CD-Roms of the newspaper that was used to provide the sentences.



Jean-Philippe Vert
Sun Dec 6 11:05:42 MET 1998