Distributed Page Ranking in Structured P2P Networks
This paper discusses the techniques of performing distributed page ranking on top of structured peer-to-peer networks. Distributed page ranking are needed because the size of the web grows at a remarkable speed and centralized page ranking is not scalable. Open System PageRank is presented in this paper based on the traditional PageRank used by Google. We then propose some distributed page ranking algorithms, partially prove their convergence, and discuss some interesting properties of them. Indirect transmission is introduced in this paper to reduce communication overhead between page rankers and to achieve scalable communication. The relationship between convergence time and bandwidth consumed is also discussed. Finally, we verify some of the discussions by experiments based on real datasets.
Link structure based page ranking for determining the “importance” of web pages has become an important technique in search engines. In particular, the HITS algorithm maintains a hub and authority score for each page, in which the authority and hub scores are computed by the linkage relationship of pages in the hyperlinked environment. The PageRank  algorithm used by Google  determines “scores” of web pages by compute the eigenvector of a matrix iteratively. As size of the web grows, it becomes harder and harder for existing search engines to cover the entire web. We need distributed search engines which are scalable with respect to the number of pages and the number of users. In a distributed search engine, page ranking is not only needed as in its centralized counterpart for improving query results, but should be performed distributedly for scalability and availability. A straightforward way to achieve distributed page ranking is simply scaling HITS or PageRank algorithms to distributed environment. But it is not a trivial thing to do that. Both HITS and PageRank are iterative algorithms. As each iteration step needs computation results of previous step, synchronize operation is needed. However, it is hard to achieve synchronous communication in wide spread distributed environment. In addition, page partitioning and communication overhead must be considered carefully while performing distributed page ranking. Structured peer-to-peer overlay networks have recently gained popularity as a platform for the construction of self-organized, resilient, large-scale distributed systems [6, 13, 14, 15]. In this paper, we try to perform effective page ranking on top of structured peer-to-peer networks. We first propose some distributed page ranking algorithms based on google’s PageRank and present some interesting properties and results about them. As communication overhead is more important than CPU and memory usage in distributed page ranking, we then discuss strategies of page partitioning and ideas about alleviating communication overhead. By doing this, our paper makes the following contributions: • We provide two distributed page ranking algorithms, partially prove their convergence, and verify their features by using a real dataset. • We identify major issues and problems related to distributed page ranking on top of structured P2P networks. • Indirect transmission is introduced in this paper to reduce communication overhead between page rankers and to achieve scalable communication. The rest of the paper is as follows: After briefly reviewing the PageRank algorithm in section 2, a modification on PageRank for open systems is proposed in section 3. Issues in distributed page ranking are discussed one by one in section 4. Section 5 uses a real dataset to validate some of our discussions
Free download research paper