Masters Thesis

Exploring PageRank Algorithms: Power Iteration & Monte Carlo Methods

A search engine is intended to crawl the world wide web and retrieve a list of sites that match the user's search terms. Listing the search results in a proper order is crucial but it is far from being a trivial task. Prior to the rise of Google, search engines were notorious for presenting irrelevant information, and in turn, were not very optimal tools. It wasn't until around 1998 that Google co-founders Lawrence Page and Sergey Brin invented the PageRank algorithm for estimating the importance of a webpage which ultimately revolutionized internet searches. The PageRank algorithm is used to estimate the importance of a webpage based on the interconnection of the web [28]. Google's superiority came in the form of arranging the more important webpages to appear early in the search results. This way, it significantly reduced the amount of time a user had to sift through search results to find the sought-after information. Although PageRank is not the only algorithm for organizing search results today, it was the first algorithm used by Google. Today there are multiple algorithms working together to prioritize search results. The concept behind PageRank is similar to that of a democracy: each webpage can vote for the importance of other webpages by providing a link to that webpage. However, not every vote is worth the same! Because a webpage with more incoming links is more important than a webpage with less incoming links, the pages with a link from a page of high importance also holds importance. Since PageRank is a variation of an eigenvector problem, we begin with a review of key concepts from linear algebra in Chapter 1. Using this linear algebra theory, we show that the appropriate model will yield the existence and uniqueness of the ranking of webpages. In Chapter 2 we describe the data structures necessary for implementing the algorithm on a computer. In Chapter 3 we describe a model that can be used to synthesize graphs which closely resemble the properties of the world wide web. In Chapter 4, we explore the power iteration, a deterministic numerical algorithm to solve such an eigenvector problem. In Chapter 5, we explore and compare Monte Carlo methods used to solve the eigenvector problem from a probabilistic approach.

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.