使用算术运算优化 MySQL 嵌套选择

Posted

技术标签:

【中文标题】使用算术运算优化 MySQL 嵌套选择【英文标题】:Optimize MySQL nested select with arithmetic operation 【发布时间】:2012-12-17 18:44:36 【问题描述】:

我在 mysql 5.1 非规范化表上运行了这个 sql 查询。它按我想要的方式工作,但它可能会很慢。我在 day 列上添加了一个索引,但它仍然需要更快。关于如何更快地获得这个的任何建议? (也许用连接代替?)

SELECT DISTINCT(bucket) AS b,
       (possible_free_slots -
          (SELECT COUNT(availability)
           FROM ip_bucket_list
           WHERE bucket = b
           AND availability = 'used'
           AND tday = 'evening'
           AND day LIKE '2012-12-14%'
           AND network = '10_83_mh1_bucket')) AS free_slots
FROM ip_bucket_list
ORDER BY free_slots DESC;

单个查询很快:

SELECT DISTINCT(bucket) FROM ip_bucket_list;
1024 rows in set (0.05 sec)

 SELECT COUNT(availability) from ip_bucket_list WHERE bucket = 0 AND availability = 'used' AND tday = 'evening' AND day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket';
1 row in set (0.00 sec)

表:

mysql> describe ip_bucket_list;
+---------------------+--------------+------+-----+-------------------+----------------+
| Field               | Type         | Null | Key | Default           | Extra          |
+---------------------+--------------+------+-----+-------------------+----------------+
| id                  | int(11)      | NO   | PRI | NULL              | auto_increment |
| ip                  | varchar(50)  | YES  |     | NULL              |                |
| bucket              | int(11)      | NO   | MUL | NULL              |                |
| availability        | varchar(20)  | YES  |     | NULL              |                |
| network             | varchar(100) | NO   | MUL | NULL              |                |
| possible_free_slots | int(11)      | NO   |     | NULL              |                |
| tday                | varchar(20)  | YES  |     | NULL              |                |
| day                 | timestamp    | NO   | MUL | CURRENT_TIMESTAMP |                |
+---------------------+--------------+------+-----+-------------------+----------------+

和 DESC:

DESC SELECT DISTINCT(bucket) as b,(possible_free_slots - (SELECT COUNT(availability) from  ip_bucket_list WHERE bucket = b AND availability = 'used' AND tday = 'evening' AND day  LIKE '2012-12-14%' AND network = '10_83_mh1_bucket')) as free_slots FROM ip_bucket_list  ORDER BY free_slots DESC;
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
| id | select_type        | table          | type | possible_keys                           | key    | key_len | ref  | rows   | Extra                           |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+
|  1 | PRIMARY            | ip_bucket_list | ALL  | NULL                                    | NULL   | NULL    | NULL | 328354 | Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | ip_bucket_list | ref  | bucket,network,ip_bucket_list_day_index | bucket | 4       | func |    161 | Using where                     |
+----+--------------------+----------------+------+-----------------------------------------+--------+---------+------+--------+---------------------------------+

【问题讨论】:

您能发布一些示例行和预期输出吗?这可能会有所帮助。 【参考方案1】:

我会使用连接将相关子查询从SELECT 子句移到FROM 子句中:

SELECT distinct bucket as b,
       (possible_free_slots - a.avail) as free_slots
FROM ip_bucket_list ipbl left outer join
     (SELECT bucket COUNT(availability) as avail
      from ip_bucket_list
      WHERE availability = 'used' AND tday = 'evening' AND
             day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
     ) on a
     on ipbl.bucket = avail.bucket
ORDER BY free_slots DESC;

SELECT 子句中的版本可能正在为每一行重新运行(甚至在distinct 运行之前)。通过将其放在from 子句中,ip_bucket_list 表将只被扫描一次。

另外,如果您希望每个存储桶只显示一次,那么我建议您使用group by 而不是distinct。它将阐明查询的目的。您可以完全消除对表格的第二次引用,例如:

SELECT bucket as b,
       max(possible_free_slots -
           (case when availability = 'used' AND tday = 'evening' AND
                      day LIKE '2012-12-14%' AND network = '10_83_mh1_bucket'
                 then 1 else 0
            end)
           ) as free_slots
FROM ip_bucket_list
group by bucket
ORDER BY free_slots DESC;

为了加快您的查询版本,您需要在bucket 上建立一个索引,因为它用于相关子查询。

【讨论】:

感谢您的回复。唯一的事情是我需要过滤我的第二个 SELECT 只包含当前存储桶的结果。 我最终只是将 DISTINCT 替换为 GROUP BY 子句,从而加快了查询速度。【参考方案2】:

尝试将子查询移到主查询中 - 像这样:

SELECT b.bucket AS b,
       b.possible_free_slots - COUNT(l.availability) AS free_slots
FROM ip_bucket_list b
LEFT JOIN ip_bucket_list l
       ON l.bucket = b.bucket
      AND l.availability = 'used'
      AND l.tday = 'evening'
      AND l.day LIKE '2012-12-14%'
      AND l.network = '10_83_mh1_bucket'
GROUP BY b.bucket, b.possible_free_slots
ORDER BY 2 DESC

【讨论】:

以上是关于使用算术运算优化 MySQL 嵌套选择的主要内容,如果未能解决你的问题,请参考以下文章

MYSQL04_算术逻辑位运算符运算符对应的习题

MySQL 算术运算符

算术和逻辑运算指令

算术运算符

MySQL运算符,SQL,算术比较逻辑位,优先级,正则表达式,完整详细可收藏

MySQL数据库中的算术运算符