帮助优化查询(显示联系人之间双向关系的强度)
Posted
技术标签:
【中文标题】帮助优化查询(显示联系人之间双向关系的强度)【英文标题】:help optimizing query (shows strength of two-way relationships between contacts) 【发布时间】:2010-06-30 05:20:11 【问题描述】:我有一个contact_relationship 表,它存储在给定时间点一个联系人和另一个联系人之间的关系报告强度。
mysql> desc contact_relationship;
+------------------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-----------+------+-----+-------------------+-----------------------------+
| relationship_id | int(11) | YES | | NULL | |
| contact_id | int(11) | YES | MUL | NULL | |
| other_contact_id | int(11) | YES | | NULL | |
| strength | int(11) | YES | | NULL | |
| recorded | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+-----------+------+-----+-------------------+-----------------------------+
现在我想获取联系人之间双向关系的列表(这意味着有两行,一个与联系人 a 指定与联系人 b 的关系强度,另一行与联系人 b 指定联系人 a 的强度 - 强度的双向关系是这两个强度值中较小的一个)。
这是我提出的查询,但速度很慢:
select
mrcr1.contact_id,
mrcr1.other_contact_id,
case when (mrcr1.strength < mrcr2.strength) then
mrcr1.strength
else
mrcr2.strength
end strength
from (
select
cr1.*
from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id,
other_contact_id
) as cr2
inner join contact_relationship cr1 on
cr1.contact_id = cr2.contact_id
and cr1.other_contact_id = cr2.other_contact_id
and cr1.recorded = cr2.max_recorded
) as mrcr1,
(
select
cr3.*
from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id,
other_contact_id
) as cr4
inner join contact_relationship cr3 on
cr3.contact_id = cr4.contact_id
and cr3.other_contact_id = cr4.other_contact_id
and cr3.recorded = cr4.max_recorded
) as mrcr2
where
mrcr1.contact_id = mrcr2.other_contact_id
and mrcr1.other_contact_id = mrcr2.contact_id
and mrcr1.contact_id != mrcr1.other_contact_id
and mrcr2.contact_id != mrcr2.other_contact_id
and mrcr1.contact_id <= mrcr1.other_contact_id;
有人对如何加快速度有任何建议吗?
请注意,由于用户可能多次指定他与特定用户的关系强度,因此您必须只获取每对联系人的最新记录。
更新:这里是解释查询的结果...
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 36029 | Using where |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 36029 | Using where; Using join buffer |
| 4 | DERIVED | <derived5> | ALL | NULL | NULL | NULL | NULL | 36021 | |
| 4 | DERIVED | cr3 | ref | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10 | cr4.contact_id,cr4.other_contact_id | 1 | Using where |
| 5 | DERIVED | contact_relationship | index | NULL | contact_relationship_index_3 | 14 | NULL | 37973 | Using index |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 36021 | |
| 2 | DERIVED | cr1 | ref | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10 | cr2.contact_id,cr2.other_contact_id | 1 | Using where |
| 3 | DERIVED | contact_relationship | index | NULL | contact_relationship_index_3 | 14 | NULL | 37973 | Using index |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
【问题讨论】:
可以发一下执行计划吗? 您可以发布您正在使用的 DBMS 吗? 【参考方案1】:您在选择最近的记录时浪费了很多时间。 2个选项:
1- 更改存储数据的方式,并拥有一个仅包含最近记录的表,以及一个更像历史记录的表。
2- 如果您的 DBMS 允许您这样做,请使用分析请求选择最近的记录。类似的东西
Select first_value(strength) over(partition by contact_id, other_contact_id order by recorded desc)
from contact_relationship
一旦你有了良好的记录行,我认为你的查询会快很多。
【讨论】:
你让我觉得我可以使用临时表来提取所有最新的关系行(见下文)。谢谢!【参考方案2】:Scorpi0 的回答让我想到也许我可以使用临时表...
create temporary table mrcr1 (
contact_id int,
other_contact_id int,
strength int,
index mrcr1_index_1 (
contact_id,
other_contact_id
)
) replace as
select
cr1.contact_id,
cr1.other_contact_id,
cr1.strength from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id, other_contact_id
) as cr2
inner join
contact_relationship cr1 on
cr1.contact_id = cr2.contact_id
and cr1.other_contact_id = cr2.other_contact_id
and cr1.recorded = cr2.max_recorded;
我不得不做两次(第二次进入一个名为 mrcr2 的临时表),因为 mysql 有一个限制,你不能在一个查询中两次为同一个临时表起别名。
用我的两个临时表创建我的查询然后变成:
select
mrcr1.contact_id,
mrcr1.other_contact_id,
case when (mrcr1.strength < mrcr2.strength) then
mrcr1.strength
else
mrcr2.strength
end strength
from
mrcr1,
mrcr2
where
mrcr1.contact_id = mrcr2.other_contact_id
and mrcr1.other_contact_id = mrcr2.contact_id
and mrcr1.contact_id != mrcr1.other_contact_id
and mrcr2.contact_id != mrcr2.other_contact_id
and mrcr1.contact_id <= mrcr1.other_contact_id;
【讨论】:
不幸的是,在生产中,我无权创建临时表:( 临时表并不是一个好主意。与你的上级争论你必须改变你的结构。一张表用于生产和当前查询,一张日志表在生产表更新时插入新行。 你能解释一下为什么临时表是个坏主意吗? 很难记录临时表中发生的事情,因此在一个过程中,如果出现问题(并且会),如果表总是创建和然后下降。临时表可用于临时解决方案,但不能作为有针对性的解决方案..以上是关于帮助优化查询(显示联系人之间双向关系的强度)的主要内容,如果未能解决你的问题,请参考以下文章