如何统计图中常见的双向连接数

Posted

技术标签:

【中文标题】如何统计图中常见的双向连接数【英文标题】:How to count the number of common bidirectional connections in graph 【发布时间】:2010-11-27 07:18:04 【问题描述】:

我正在尝试编写一个查询来计算代表图中节点的用户之间的双向(强)连接数。

为了测试查询,我创建了以下示例

存储在表monthly_connections_test中:

calling_party, called_party, link_strength


z1  z2  1,0000000
z1  z3  1,0000000
z3  z1  1,0000000
z1  z4  1,0000000
z1  z5  1,0000000
z5  z1  1,0000000
z2  z4  1,0000000
z2  z5  1,0000000
z5  z2  1,0000000
z2  z7  1,0000000
z7  z2  1,0000000
z4  z7  1,0000000
z7  z4  1,0000000
z2  z1  1,0000000

对于 z1 和 z2 之间的强连接,以下查询返回 2 而不是 1:

SELECT  user1, user2, 0 AS calling_calling, 0 AS calling_called, 0 AS called_calling, 0 AS called_called, COUNT(*) AS both_directions
 FROM   (SELECT monthly_connections_test.calling_party AS user1, monthly_connections_test_1.calling_party AS user2
FROM         monthly_connections_test INNER JOIN
                      monthly_connections_test AS monthly_connections_test_1 ON 
                      monthly_connections_test.called_party = monthly_connections_test_1.called_party AND 
                      monthly_connections_test.calling_party < monthly_connections_test_1.calling_party) t1 
                      INNER JOIN monthly_connections_test AS monthly_connections_test_2 ON
                      t1.user2 = monthly_connections_test_2.called_party
                      AND t1.user2 < monthly_connections_test_2.calling_party
GROUP BY t1.user1, t1.user2

示例结果如下:

z1  z2  0   0   0   0   2
z2  z3  0   0   0   0   3
z2  z4  0   0   0   0   1
z1  z5  0   0   0   0   3
z2  z5  0   0   0   0   3
z3  z5  0   0   0   0   2
z1  z7  0   0   0   0   4
z2  z7  0   0   0   0   1
z5  z7  0   0   0   0   1

有谁知道如何修改查询以返回双向连接的公共邻居的数量(在此示例中,z1、z2 的正确值为 1,因为 z5 在两个方向都连接到 z1 和 z2方向)?

问题是,我猜在这部分

INNER JOIN monthly_connections_test AS monthly_connections_test_2 ON
                      t1.user2 = monthly_connections_test_2.called_party
                      AND t1.user2 < monthly_connections_test_2.calling_party

正确的结果应该如下:

z1  z2  0   0   0   0   1
z2  z3  0   0   0   0   1
z2  z4  0   0   0   0   1
z1  z5  0   0   0   0   1
z2  z5  0   0   0   0   1
z3  z5  0   0   0   0   1
z1  z7  0   0   0   0   1
z2  z7  0   0   0   0   0
z5  z7  0   0   0   0   1

连接条件必须以这样一种方式制定,即每个连接只计算一次(此时必须排除以前包含的连接)但还没有找到解决方案。

附:由于原始表包含 24M 条记录,因此必须以这样的方式编写查询,即它在可接受的时间内返回结果。一开始尝试编写带有多个选择的查询,执行起来花费了太多时间。

【问题讨论】:

如果您发布正确的输入输出对示例,我和其他用户会更容易理解需要做什么。现在我知道你想计算双向链接,但是这个查询非常简单,比你写的要简单得多,所以现在我想我错了。 【参考方案1】:

先写一个表值函数——

create function getBiConnectedNeighbours
(
@P_PARTY nvarchar(50)
)
returns table
as
return
(
   select called_party as neighbour
     from monthly_connections_test a
     where calling_party = @P_PARTY
       and exists (select 1 from monthly_connections_test b
                      where a.called_party = b.calling_party and
                            b.called_party = a.calling_party) -- this subquery is to get bidirectionals only

)

然后使用函数作为

select count(1) 
from getBiConnectedNeighbours('z1') a inner join
     getBiConnectedNeighbours('z2') b on a.neighbour = b.neighbour

【讨论】:

这适用于 sql server.. 不知道您使用的是哪个数据库,但如果需要,请相应地修改语法。【参考方案2】:

通过尝试多种解决方案,以下查询返回了上述示例的正确结果:

    SELECT  t1.user1, t1.user2, 0 AS calling_calling, 0 AS calling_called, 0 AS called_calling, 0 AS called_called, COUNT(*) AS both_directions
 FROM   (SELECT monthly_connections_test.calling_party AS user1, monthly_connections_test_1.calling_party AS user2, monthly_connections_test.called_party AS calledUser
FROM         monthly_connections_test INNER JOIN
                      monthly_connections_test AS monthly_connections_test_1 ON 
                      monthly_connections_test.called_party = monthly_connections_test_1.called_party AND 
                      monthly_connections_test.calling_party < monthly_connections_test_1.calling_party) t1 
                      INNER JOIN monthly_connections_test AS monthly_connections_test_2 ON
                      monthly_connections_test_2.called_party = t1.user1
                      AND monthly_connections_test_2.calling_party = t1.calledUser

【讨论】:

以上是关于如何统计图中常见的双向连接数的主要内容,如果未能解决你的问题,请参考以下文章

如何linux查看apche连接数

使用Python统计端口TCP连接数

查看linux中的TCP连接数

如何在Linux系统下查看apache的并发连接数

如何查看SQL SERVER数据库当前连接数

ab,qps 并发连接数