第五周.01.PinSage论文带读

Posted 2021-08-01 oldmao_2001

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了第五周.01.PinSage论文带读相关的知识，希望对你有一定的参考价值。

文章目录

论文带读
其他

本文内容整理自深度之眼《GNN核心能力培养计划》
公式输入请参考：在线Latex公式

论文带读

Graph Convolutional Neural Networks for Web-Scale Recommender Systems
PinSage模型，主要用于大型推荐系统中。是目前已经落地的模型，非常重要。
一些基础概念要了解：
召回
A/B test
harder-and-harder training examples

摘要套路分析

第一句话先讲现在基于DNN的RS啥情况，然后However转折，提出现在面临的问题是什么。
Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. However, making these methods practical and scalable to web-scale recommendation tasks with billions of items and hundreds of millions of users remains a challenge.

然后总分法讲本文的做法。
第一句（Here we describe）是本文总体工作（提出什么算法、模型等）。
第二句（We develop…）对工作进行简单展开描述。
第三句（Compared to）对比前人工作有什么创新。
Here we describe a large-scale deep recommendation engine that we developed and deployed at Pinterest. We develop a dataefficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harder-and-harder training examples to improve robustness and convergence of the model.

然后是实验结果证明本文的算法或模型如何优秀。
列举了一些具体的数据，用了啥实验方法。最后一句（this is the largest application）总结收尾。
We deploy PinSage at Pinterest and train it on 7.5 billion examples on a graph with 3 billion nodes representing pins and boards, and 18 billion edges. According to offline metrics, user studies and A/B tests, PinSage generates higher-quality recommendations than comparable deep learning and graph-based alternatives. To our knowledge, this is the largest application of deep graph embeddings to date and paves the way for a new generation of web-scale recommender systems based on graph convolutional architectures.

3.1 Problem Setup

前面的介绍和相关工作略，直接看第三节METHOD。
这个小节先对任务进行了说明，就是要生成pin的embedding，然后这个结果可以用在推荐系统里面的召回阶段。
Our task is to generate high-quality embeddings or representations of pins that can be used for recommendation (e.g., via nearestneighbor lookup for related pin recommendation, or for use in a downstream re-ranking system).
模型用的是二分图bipartite graph
在这里插入图片描述
原文中把两个集合定义为I和C
I看做是商品（item/pin）
C看做是用户的收藏
给出符合化的描述后，给出目标：结合节点本身的属性（初始化的embedding）结合图的结构信息来生成高质量的embedding。
Our goal is to leverage both these input attributes as well as the structure of the bipartite graph to generate high-quality embeddings.

3.2 Model Architecture

在这里插入图片描述
第一行是通过将邻居节点原始embedding $z_v$ 丢进一个FC的NN（ $Qh_v+q$ ），然后过非线性ReLU，然后在走aggregate函数 $\\gamma$ 得到邻居节点的embedding $n_u$ ，这里貌似有个typo： $h_v$ 应该是 $z_v$
第二行是将邻居节点的embedding $n_u$ 和当前节点的embedding $z_u$ 进行拼接，然后在走一次卷积操作
第三行是Layer 的归一化操作，解决Internal Convariate Shift (ICS)。

注意：第一行获取的是图的局部信息（也有一些获取全局信息的trick，类似DL中的残差操作？加连接所有节点的虚拟节点）；
第二行是结合节点本身信息。
这里由于这篇文章涉及的数据集非常庞大，因此做卷积计算量也很大，需要对邻居节点进行过滤（是不是听上去有点像负采样，但是负采样是随机的），过滤的原则就是Importance-based neighborhoods. 就是只取所有邻居中前T个重要节点作为邻居采样结果。这个trick的好处有2个：
1、减少计算量，加载图更加快
2、考虑重要性，就是和GAT思想一样的，但是这里和GAT计算方式不一样，不是所有节点来一遍attention那套，而是随机游走，然后看邻居节点 $L_1$ 访问次数，根据这个次数来决定邻居重要性，比GAT也是要快。

3.3 Model Training

这里提到的Algorithm 2: minibatch实际上就是在GraphSAGE中提到过的基于batch的训练是一样样的。这个也是相对于GCN改进的一个点，GCN是要输入整张图，无法用在节点非常多的数据集，内存会爆，而基于batch的训练可以用在大图（上亿个点）上。
在这里插入图片描述
先确定 $M$ ，这个就是一个batch要训练的节点，然后找出 $M$ 节点的邻居节点，然后经过K层的图卷积（1-14），然后最后进入FC的NN（15-17）
损失函数：用的是Margin based的损失函数，这个不展开了，不明白就百度，就是要使得正样本距离越近越好，负样本距离越远越好，然后用margin来调整二者的差距。
关于代码操作：

运算单元	操作
GPU	论文的两个算法
CPU	extracting features, re-indexing, and negative sampling

原文还用到了一些并行化的trick。
关于负采样的trick，和普通负采样不太一样的是用借鉴了PageRank算法，采样硬负样本。大体流程是用Personalized PageRank算法计算出当前节点与其余所有节点的分数，Personalized体现在每个节点与其他节点的score是针对当前节点而言的，不是原始PageRank那种全局分数。然后再从分数中的2000-5000排名中随机采样500个负样本，关于这个排名顺序：1-1999很可能包含正样本，5001之后虽然是负样本但是太simple。
在这里插入图片描述

4.1 Experimental Setup

这里提到模型使用了多模态学习，由于数据本身是包含图片和文本（描述）信息的，原文用VGG抽图片特征，得到4096维特征，然后用Word2Vec抽文本特征，得到256维特征，然后实验时候单独用图片、单独用文本、融合二者、单独用基于随机游走图模型来进行了对比。

其他

还有一些论文待讲解：
Wide & Deep Learning for Recommender Systems
Deep Interest Evolution Network for Click-Through Rate Prediction

以上是关于第五周.01.PinSage论文带读的主要内容，如果未能解决你的问题，请参考以下文章