HomePage | Recent changes | View source | Discuss this page | Page history | Log in |

Printable version | Disclaimers | Privacy policy

PageRank is a concept of assigning numerical values to all Web pages listed in a search engine. The PageRank of a page is defined recursively and depends on the number and PageRanks of all pages that link to it. A page that is linked by many pages with high rank receives a high rank itself.

The formula uses a model of a random surfer who gets bored after several clicks and switches to a random page. The PageRank value of a page reflects the frequency of hits on that page by a random surfer.

It can be understood as a Markovian process in which the states are pages, and the transitions are all equally probable and are the links between pages. If a page has no links to another pages, it becames a sink and therefore makes this whole think unusable, because the sink pages will trap the random visitors forever. However, the solution is quite simple. If the random surfer arrives to a sink page, it picks another URL at random and continues surfing again. To be fair with pages that are not sinks, these random transitions are added to all nodes in the Web, with a residual probability of usually q=0.15.

So, the equation is as follows:

Pagerank(i) = (q/N) + (1-q) Sum(j={pages that point to i}; Pagerank(j))

It's worth noticing, and that's why the Pagerank is so appealing in terms of elegance, that the Pagerank values are the eigenvalues of the modified adjacency matrix.The values are fast to calculate (only a few iterations are needed) and in practice it gives good results. The main disadvantage is that it favors older pages, because a new page, even a very good one, will not have too many links. That's why it should be combined with textual analysis or other ranking methods.

The PageRank system is used by Google and was developed by Google's founders Larry Page and Sergey Brin while at Stanford University in 1998.

Name PageRank is trademark of Google.