Document re-ranking by generality in bio-medical information retrieval.
MetadataShow full item record
Document ranking is well known to be a crucial process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document rank- ing methods are based on different measurements of similarity between documents and query. Due to information explosion and the popularity of WWW information retrieval, the increased variety of information and users makes it insu±cient to consider similarity alone in the ranking pro- cess. In some cases, there is a need for user to retrieve documents which are generally or broadly describing a certain topic. This is particularly the case in some specific domains such as bio-medical IR. To satisfy the stringent requirement of generality based retrieval, we propose a novel ap- proach to re-rank the retrieved documents by considering their generality as a compliment. By analyzing the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents' generality to the query's. Results show an encouraging performance on a large scale bio-medical text corpus, OHSUMED, which is a subset of MEDLINE collection containing 348,566 medical journal references and 101 test queries.