Relevance Ranking in text database

Relevance Ranking

• The most powerful weapon in the PsycCrawler arsenal is relevance ranking. Simply
put, relevance ranking arranges a set of retrieved records so that those most likely to
be relevant to the request are shown first. That is, after PsycCrawler retrieves all
documents that satisfy the search query, it uses relevance ranking to arrange them
based on a measurement of similarity between the query and the content of each
record. PsycCrawler performs a content analysis of records in the database by using
a combination of the following indicators:
• Breadth of Match. The more distinct query terms that appear in a document, the
higher the weight of relevance.
• Inverse Document Frequency. Rare terms (within the entire database) receive a
higher weight of relevance.
• Frequency. The number of times a query term occurs in a document.
• Density. The comparable length of retrieved documents.
Consideration of these combined criteria produces intelligent on-the-fly evaluation
of a record's likelihood of satisfying the intent behind the query.
This allows to find more relevant information with less effort. Regardless of how
many records the search query retrieves, it is needed to review relatively few of
them, because moving down the ranking means moving toward less relevant
records. With relevance ranking, less time is spent in reviewing search results before
deciding whether they are satisfactory.
The burden of composing complex logical queries, which are used to reduce the
amount of retrieved data to manageable proportions is reduced. It is not necessary to
care about how many records are retrieved, as long as the best information floats to
the top.

Comments

Popular posts from this blog

Handling of Skew

Fragment-and-Replicate Join

USER INTERFACE DESIGN FOR ANNA UNIVERSITY SYLLABUS