Concept induction via fuzzy C-means clustering in a high dimensional semantic space.
Bruza, Peter D.
Lau, Raymond Y. K.
MetadataShow full item record
Lexical semantic space models have recently been investigated to automatically derive the meaning (semantics) of information based on natural language usage. In a semantic space, a term can be considered as a concept represented geometrically as a vector, the components of which correspond to terms in a vocabulary. A primary way to perform reasoning in a semantic space is to categorize concepts in the space into a number of regions (i.e., groups). Such a process is referred to as concept induction, which can be realized by clustering objects in the space. The resulting groups can potentially form a basis for knowledge discovery and ontology construction. Conventional clustering algorithms, e.g., the K-Means method, normally produce crisp clusters, i.e., an object could be assigned to only one cluster. It is not always the case in reality. For example, a word “Reagan” may belong to both the cluster about administration of US government, and another one about the Iran-contra scandal. Therefore, a membership function is applied, which determines the degree to which an object belongs to different clusters. This chapter introduces a cognitively motivated semantic space model, namely Hyperspace Analogue to Language (HAL), and shows how a fuzzy C-Means clustering algorithm is used to concept categorization in the high dimensional semantic space. The experimental results indicate that applying fuzzy C-Means clustering over the HAL semantic space is promising in constructing semantically related groups of terms.