next up previous contents
Next: Non-organized semantic representation Up: Structured organizations of concepts Previous: Co-occurrence graphs at ETL

Hierarchical clustering of words at ATR

Using a textual database made of articles from the Wall Street Journal researchers from the Advanced Telecommunications Research Institute (ATR) near Kyoto have obtained a hierarchical classification of the 70,000 words used the most often. The result of this classification is a binary tree where the 70,000 terminal leaves represent the 70,000 words, and where each node represent a class of words that contains the words of the children nodes.

The tree was built automatically, starting from 70,000 isolated leaves and clustering iteratively the classes used in similar contexts. Further, every node of the tree and thus every concept can be coded by a series of bits.



Jean-Philippe Vert
Sun Dec 6 11:05:42 MET 1998