Topic Sensitive Page Rank A Context Sensitive Ranking Algorithm for Web Search

The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative \importance” of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared. By us- ing linear combinations of these (precomputed) biased PageRank vectors to generate context-speci c importance scores for pages at query time, we show that we can gener- ate more accurate rankings than with a single, generic PageRank vector. We describe techniques for efficiently implementing a large scale search system based on the topic- sensitive PageRank scheme. Various link-based ranking strategies have been developed recently for improving Web-search query results. The HITS algorithm proposed by Kleinberg relies on query-time processing to deduce the hubs and authorities that exist in a subgraph of the Web consisting of both the results to a query and the local neighborhood of these results. Bharat and Henzinger [4] augment the HITS algorithm with content analysis to improve precision for the task of re- trieving documents related to a query topic (as opposed to retrieving documents that exactly satisfy the user’s information need).make use of HITS for automatically compiling resource lists for general topics. The PageRank algorithm, introduced by , precomputes a rank vector that provides a-priori \importance” estimates for all of the pages on the Web. This vector is com- puted once, oine, and is independent of the search query. At query time, these importance scores are used in conjunction with query-speci c IR scores to rank the query results [7]. PageRank has a clear eciency advantage over the HITS algorithm, as the query-time cost of incorporating the precomputed PageRank importance score for a page is low. Furthermore, as PageRank is generated using the entire Web graph, rather than a small subset, it is less sus- ceptible to localized link spam. Figure 1 illustrates a system utilizing the standard PageRank

Free download research paper