社交网络分析与挖掘 第六课:网页排序
Posted kisinfinite
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了社交网络分析与挖掘 第六课:网页排序相关的知识,希望对你有一定的参考价值。
The History of PageRank
- PageRank is a link analysis algorithm which assigns a numerical weighting to each Web page,with the purpose of "measuring"relative importance.
- Based on the hyperlinks map
- An excellent way to prioritize the results of web keyword searches
- PageRank was proposed by Larry Page(hencethe name Page-Rank)and Sergey Brin in 1998.
- Shortly after,Page and Brin founded Google.
PageRank:the intuitive idea
- Although page E hasmore"point to"links than page C, page C is more important than page E.
- One reason is that page C is linked by page B whose is the most important page in this network.
A Simple Version of PageRank
[ R(u)=c sum_{v in B_{u}} frac{R(v)}{N_{v}} ]
- u:a web page
- B(u):the set of u‘s backlinks(point-to-links)
- N(v):the number of forward links of page v
- c:the normalization factor to make
[ |mathrm{R}|_{mathrm{L} 1}=left|mathrm{R}_{1}+ldots+mathrm{R}_{mathrm{n}} ight| ]
An example of Simplified PageRank
A Problem with Simplified PageRank
A loop:
During each iteration,the loop accumulates rank but never distributes rank to other pages!
Fix the problem:what PageRank does
- Randomly walking and also randomly jumping
- the“random surfer”simply keeps clicking successive links at random,but periodically “gets bored”and jumps to a random page based on the distribution
- the“random surfer”simply keeps clicking successive links at random,but periodically “gets bored”and jumps to a random page based on the distribution
The actual PageRank
[ R^{prime}(u)=c_{1} sum_{v in B_{u}} frac{R^{prime}(v)}{N_{v}}+mathrm{c}_{2} E(u) ]
- E(u):a distribution of ranks of web pages that "users"jump to when they"gets bored"after successive links at random.
An example of PageRank
Convergence Experiments
- PR(322 Million Links):52 iterations
- PR(161Million Links):45 iterations
- Scaling factor is roughly linear in logn
Conclusion
- PageRank rank web pages based on their locations in the web graph structure
- PageRank uses information which is external to the web pages-backlinks
- Backlinks from important pages are more significant than backlinks from common pages
- The structure of the web graph is very useful for information retrieval tasks.
以上内容出自小象学院
以上是关于社交网络分析与挖掘 第六课:网页排序的主要内容,如果未能解决你的问题,请参考以下文章