MYSQL查询优化,多查询还是一大查询

Posted

技术标签:

【中文标题】MYSQL查询优化,多查询还是一大查询【英文标题】:MYSQL query optimization, multiple queries or one large query 【发布时间】:2012-09-17 00:25:38 【问题描述】:

我有一个查询,其中有一些子查询(内部选择),我正在尝试找出哪个对性能更好,一个更大的查询或许多较小的查询,我发现很难尝试和计算差异因为它在我的服务器上一直在变化。

我使用下面的查询一次返回 10 个结果以显示在我的网站上,使用分页(偏移和限制)。

SELECT adverts.*, breed.breed, breed.type, sellers.profile_name, sellers.logo, users.user_level , 
round( sqrt( ( ( (adverts.latitude - '51.558430') * (adverts.latitude - '51.558430') ) * 69.1 * 69.1 ) + ( (adverts.longitude - '-0.0069345') * (adverts.longitude - '-0.0069345') * 53 * 53 ) ), 1 ) as distance, 
( SELECT advert_images.image_name FROM advert_images WHERE advert_images.advert_id = adverts.advert_id AND advert_images.main = 1 LIMIT 1) as imagename, 
( SELECT count(advert_images.advert_id) from advert_images WHERE advert_images.advert_id = adverts.advert_id ) AS num_photos 
FROM adverts 
LEFT JOIN breed ON adverts.breed_id = breed.breed_id 
LEFT JOIN sellers ON (adverts.user_id = sellers.user_id) 
LEFT JOIN users ON (adverts.user_id = users.user_id) 
WHERE (adverts.status = 1) AND (adverts.approved = 1) 
AND (adverts.latitude BETWEEN 51.2692837281 AND 51.8475762719) AND (adverts.longitude BETWEEN -0.472015213613 AND 0.458146213613) 
having (distance <= '20') 
ORDER BY distance ASC 
LIMIT 0,10

最好从主查询中删除下面的 2 个内部选择,然后在我的 php 循环中,调用 2 个选择 10 次,循环中的每条记录一次?

( SELECT advert_images.image_name FROM advert_images WHERE advert_images.advert_id = adverts.advert_id AND advert_images.main = 1 LIMIT 1) as imagename, 
( SELECT count(advert_images.advert_id) from advert_images WHERE advert_images.advert_id = adverts.advert_id ) AS num_photos 

【问题讨论】:

【参考方案1】:

避免子查询

据我了解,您的内部选择有两个目的:查找关联图像的任何名称,以及计算关联图像的数量。您可能会使用左连接而不是内部选择来实现这两者:

SELECT …,
      advert_images.image_name AS imagename,
      COUNT(advert_images.advert_id) AS num_photos,
      …
FROM …
     LEFT JOIN advert_images ON advert_images.advert_id = adverts.advert_id
…
GROUP BY adverts.advert_id
…
LIMIT 0,10

我还没有尝试过,但也许 mysql 引擎足够聪明,只对您实际返回的行执行查询的那部分。

请注意,对于给定的一组图像,此查询将返回哪个图像名称根本无法保证。如果你想要可重现的结果,你应该在那里使用一些聚合函数,例如MIN(advert_images.image_name) 选择按字典顺序排列的第一个图像。

单独选择但没有循环

如果上述方法不起作用,即查询仍将检查 advert_images 表以查找计算结果的 所有 行,那么执行第二个查询可能会更好。但是,您可以尝试避免 for 循环,而是在单个查询中获取所有这些行:

SELECT advert_images.image_name AS imagename,
       COUNT(advert_images.advert_id) AS num_photos
FROM advert_images
WHERE advert_images.advert_id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
GROUP BY advert_images.advert_id

此查询中的十个参数对应于您当前生成的十行结果。请注意,没有相关照片的广告根本不会包含在该结果中。因此,请确保在您的代码中将num_photos 默认为零,将imagename 默认为NULL

临时表

实现您尝试做的另一种方法是使用显式临时内存表:首先选择您感兴趣的结果,然后检索所有相关信息。

CREATE TEMPORARY TABLE tmp
SELECT adverts.advert_id, round(…) as distance
FROM adverts
WHERE (adverts.status = 1) AND (adverts.approved = 1)
  AND (adverts.latitude BETWEEN 51.2692837281 AND 51.8475762719)
  AND (adverts.longitude BETWEEN -0.472015213613 AND 0.458146213613)
HAVING (distance <= 20)
ORDER BY distance ASC
LIMIT 0,10;

SELECT tmp.distance, adverts.*, …
       advert_images.image_name AS imagename,
       COUNT(advert_images.advert_id) AS num_photos,
       …
FROM tmp
     INNER JOIN adverts ON tmp.advert_id = adverts.advert_id
     LEFT JOIN breed ON adverts.breed_id = breed.breed_id
     LEFT JOIN sellers ON adverts.user_id = sellers.user_id
     LEFT JOIN users ON adverts.user_id = users.user_id
     LEFT JOIN advert_images ON advert_images.advert_id = adverts.advert_id
GROUP BY adverts.advert_id
ORDER BY tmp.distance ASC;

DROP TABLE tmp;

这将确保仅针对您当前正在处理的结果查询所有其他表。毕竟,advert_images 表并没有什么神奇之处,只是您可能需要其中的多行。

子查询作为连接因子

在上一段的方法的基础上,您甚至可以避免管理临时表,而使用子查询来代替:

SELECT sub.distance, adverts.*, …
       advert_images.image_name AS imagename,
       COUNT(advert_images.advert_id) AS num_photos,
       …
FROM ( SELECT adverts.advert_id, round(…) as distance
        FROM adverts
        WHERE (adverts.status = 1) AND (adverts.approved = 1)
          AND (adverts.latitude BETWEEN 51.2692837281 AND 51.8475762719)
          AND (adverts.longitude BETWEEN -0.472015213613 AND 0.458146213613)
        HAVING (distance <= 20)
        ORDER BY distance ASC
        LIMIT 0,10;
     ) AS sub
     INNER JOIN adverts ON sub.advert_id = adverts.advert_id
     LEFT JOIN breed ON adverts.breed_id = breed.breed_id 
     LEFT JOIN sellers ON (adverts.user_id = sellers.user_id) 
     LEFT JOIN users ON (adverts.user_id = users.user_id) 
     LEFT JOIN advert_images ON advert_images.advert_id = adverts.advert_id
GROUP BY adverts.advert_id
ORDER BY sub.distance ASC

再次,您仅使用 adverts 表中的数据确定相关行,并仅连接其他表中所需的行。最有可能的是,该中间结果将在内部存储在一个临时表中,但这取决于 SQL 服务器来决定。

【讨论】:

您好,感谢您的详细回答。我尝试了您在 group by 中提到的第一种方法并删除了内部选择,但是查询比原来的要慢很多。临时表听起来确实是个好主意,但是当查询在服务器上运行大约每秒 10 次时,它是否可以正常工作,因为网站非常繁忙? @user1052096,只要临时表方法的两个查询足够接近,应该不会有什么影响。临时表对于连接是本地的,因此不会有任何名称冲突。与第二个查询结果的许多列相比,tmp 表的内存消耗应该很小,因此组合解决方案可能会比原始查询使用更少的内存。但我只是有另一个想法,稍后我会编辑到我的答案中。 您好 MvG,感谢您使用子查询进行的更新,它确实有效,但我不知道它是否更快。如果我只运行子查询,它会在 0.02 秒内运行,但如果我在不选择 advert_id 的情况下运行子查询,它的运行速度会是 0.01 秒的两倍。【参考方案2】:

我认为 MySQL 使用文件排序 + 临时表来执行您的查询。这就是为什么在大桌子上你的建议会产生更好的结果。一般来说,您最好执行较小的查询然后 1 大。

【讨论】:

嗨,因为我是按距离排序的,这是一个计算字段,它确实使用了文件排序,当表格很大时它会减慢速度。因此,是否会针对主查询的表中的每条记录运行 2 个内部选择?如果是这样,我会认为只在结果集上运行 2 个内部选择会更快。 是的内部选择将在每一行上执行

以上是关于MYSQL查询优化,多查询还是一大查询的主要内容,如果未能解决你的问题,请参考以下文章

MySQL性能优化

MySQL 查询优化 - 子查询 + 多连接

MySQL中另一种查询优化方案—重构查询的方式

mysql查询优化

Sql优化-多like模糊查询及根据时间排序

mysql查询所用时间过长 如何优化?