社交网络分析与挖掘 第六课:网页排序

Posted kisinfinite

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了社交网络分析与挖掘 第六课:网页排序相关的知识,希望对你有一定的参考价值。

The History of PageRank

  • PageRank is a link analysis algorithm which assigns a numerical weighting to each Web page,with the purpose of "measuring"relative importance.
  • Based on the hyperlinks map
  • An excellent way to prioritize the results of web keyword searches
  • PageRank was proposed by Larry Page(hencethe name Page-Rank)and Sergey Brin in 1998.
  • Shortly after,Page and Brin founded Google.

PageRank:the intuitive idea

技术图片

  • Although page E hasmore"point to"links than page C, page C is more important than page E.
  • One reason is that page C is linked by page B whose is the most important page in this network.

A Simple Version of PageRank

[ R(u)=c sum_{v in B_{u}} frac{R(v)}{N_{v}} ]

  • u:a web page
  • B(u):the set of u‘s backlinks(point-to-links)
  • N(v):the number of forward links of page v
  • c:the normalization factor to make
    [ |mathrm{R}|_{mathrm{L} 1}=left|mathrm{R}_{1}+ldots+mathrm{R}_{mathrm{n}} ight| ]

An example of Simplified PageRank

技术图片

A Problem with Simplified PageRank

A loop:
技术图片
During each iteration,the loop accumulates rank but never distributes rank to other pages!

Fix the problem:what PageRank does

  • Randomly walking and also randomly jumping
    • the“random surfer”simply keeps clicking successive links at random,but periodically “gets bored”and jumps to a random page based on the distribution
      技术图片

The actual PageRank

[ R^{prime}(u)=c_{1} sum_{v in B_{u}} frac{R^{prime}(v)}{N_{v}}+mathrm{c}_{2} E(u) ]

  • E(u):a distribution of ranks of web pages that "users"jump to when they"gets bored"after successive links at random.

An example of PageRank

技术图片

Convergence Experiments

  • PR(322 Million Links):52 iterations
  • PR(161Million Links):45 iterations
  • Scaling factor is roughly linear in logn
    技术图片

Conclusion

  • PageRank rank web pages based on their locations in the web graph structure
  • PageRank uses information which is external to the web pages-backlinks
  • Backlinks from important pages are more significant than backlinks from common pages
  • The structure of the web graph is very useful for information retrieval tasks.

以上内容出自小象学院

以上是关于社交网络分析与挖掘 第六课:网页排序的主要内容,如果未能解决你的问题,请参考以下文章

数据挖掘入门与提升,从点数据到网络数据

PaperDaily|基于社交网络文本挖掘的品牌情感分析

社交网络与数据挖掘重量级讲者都到了!

社交网站的数据挖掘与分析 中文版 pdf

Python数据分析与挖掘

华人学者访谈 | 数据挖掘与分析世界里的精彩人生