Topic Sensitive PageRank

In the original PageRank algorithm for improving the rank- ing of search-query results, a single PageRank vector is com- puted, using the link structure of the Web, to capture the relative \importance” of Web pages, independent of any par- ticular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased us- ing a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate query-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordi- nary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive Page- Rank scores using the topic of the context in which the query appeared. algorithm proposed in relies on query-time processing to deduce the hubs and authorities that exist in a subgraph of the Web consisting of both the results to a query and the local neighborhood of these results. [4] augments the HITS algorithm with content analysis to improve precision for the task of retrieving documents related to a query topic (as op- posed to retrieving documents that exactly satisfy the user’s information need). makes use of HITS for automatically compiling resource lists for general topics. The PageRank algorithm discussed inprecomputes a rank vector that provides a-priori \importance” estimates for all of the pages on the Web. This vector is computed once, oine, and is independent of the search query. At query time, these importance scores are used in conjunc- tion with query-specific IR scores to rank the query results. PageRank has a clear eciency advantage over the HITS algorithm, as the query-time cost of incorporating the pre- computed PageRank importance score for a page is low. Fur- thermore, as PageRank is generated using the entire Web graph, rather than a small subset, it is less susceptible to localized link spam. In this paper, we propose an approach that (as with HITS) allows the query to in uence the link-based score, yet (as with PageRank) requires minimal query-time processing. In our model, we compute oine a set of PageRank vectors, each biased with a di erent topic, to create for each page a set of importance scores with respect to particular top- ics. The idea of biasing the PageRank computation was suggested in for the purpose of personalization, but was never fully explored. This biasing process involves introduc- ing artificial links into the Web graph during the oine rank computation, and is described further in Section 2. By making PageRank topic-sensitive, we avoid the prob- lem of heavily linked pages getting highly ranked for queries for which they have no particular authority . Pages con- sidered important in some subject domains may not be con- sidered important in others, regardless of what keywords may appear either in the page or in anchor text referring to the page . An approach termed Hilltop, with moti- vations similar to ours, is suggested in that is designed to improve results for popular queries. Hilltop generates a query-specific authority score by detecting and indexing pages that appear to be good experts for certain keywords, based on their outlinks. However, query terms for which experts were not found will not be handled by the Hilltop algorithm.

Free download research paper