加速使用 Group By 和 Order By 的多表 Mysql 查询

Posted

技术标签:

【中文标题】加速使用 Group By 和 Order By 的多表 Mysql 查询【英文标题】:Speed up Multi-Table Mysql Query that uses Group By and Order By 【发布时间】:2020-02-10 02:53:27 【问题描述】:

我正在尝试加快在 mysql 5.7 上运行的 Mysql 查询。该表现在大约有 900 万条记录(不断增长)。我有一个严格的数据库结构,不允许连接(未来的扩展证明)。目前在我的本地机器上运行需要 250-450 毫秒。我不确定它实际上可以加速多少,但我必须尝试。

根据我的评估,子查询似乎是缓慢的部分,其次是 GROUP BYORDER BY

这是表结构:

CREATE TABLE `image_has_posts` (
  `image_id` int(9) unsigned NOT NULL,
  `post_id` bigint(20) unsigned NOT NULL,
  `sequence` int(11) unsigned NOT NULL,
  PRIMARY KEY (`image_id`,`post_id`),
  KEY `index_on_timestamp` (`sequence`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `post_has_privacy` (
  `post_id` bigint(20) unsigned NOT NULL,
  `is_private` tinyint(1) unsigned NOT NULL,
  `sequence` int(11) unsigned NOT NULL,
  PRIMARY KEY (`post_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

以下是该表的一些示例数据:

image_id  post_id       sequence
7264      44568969088   1033459621
8564      112520730415  1033459642
9205      44568969087   1033459645

这里是查询。 where in 子句将有 200-1000 条记录:

SELECT sql_no_cache 
post_id, image_id
FROM image_has_posts
WHERE image_id IN (40334, 48848, 8993, 32740, 39664, 5701, 53308, 1001, 1230, 24732, 12341, 25777, 94, 56560, 31853, 17884, 16591, 38522, 29450, 31360, 12025, 17799, 17488, 4917, 23317, 9488, 65885, 65175, 16027, 32138, 32056, 56434, 2009, 30706, 36260, 28985, 793, 17146, 22163, 25433, 65046, 56517, 3008, 34893, 34867, 69689, 31359, 19366, 3338, 29484, 49566, 70479, 6415, 19287, 812, 12677, 27023, 17275, 22072, 13930, 25153, 43221, 35516, 14346, 6514, 28791, 21220, 39667, 24438, 3486, 76252, 26890, 30905, 3579, 69546, 4674, 19159, 21693, 36639, 1687, 65678, 9250, 34009, 36514, 13014, 41571, 6454, 35857, 9804, 66333, 16500, 38051, 16759, 39944, 27128, 49703, 33347, 22720, 69, 17515, 5385, 14000, 8418, 6317, 1397, 7340, 3159, 31581, 311, 30628, 9919, 24229, 39407, 33572, 69327, 28745, 32309, 2334, 3988, 13243, 53287, 16034, 29050, 12818, 76190, 2463, 18207, 703, 15259, 31352, 8006, 14117, 44806, 22717, 24661, 15841, 6754, 32885, 9628, 7107, 17055, 49515, 66406, 902, 16878, 14841, 10932, 23521, 8825, 17754, 2118, 36839, 53234, 39556, 9363, 32840, 37663, 9482, 24228, 66371, 49603, 29614, 12066, 37190, 12389, 28775, 7200, 7187, 29489, 28055, 75902, 36348, 25535, 15696, 75929, 10172, 7186, 5218, 66118, 65368, 15337, 2601, 23819, 10870, 24120, 4165, 23426, 4109, 17581, 5102, 23911, 7263, 15024, 31357, 7193, 25665, 8691, 3791, 35934, 11423, 10726, 7595, 9381, 17267, 15641, 16628, 56870, 25617, 66436, 12370, 27873, 9514, 59397, 26807, 64879, 11301, 22795, 29274, 29007, 13281, 27985, 34630, 30844, 40127, 66254, 56886, 27887, 23623, 30851, 23897, 11672, 11501, 41153, 31660, 37927, 6710, 44566, 9817, 39110, 65886, 186, 15385, 3505, 17610, 14921, 10637, 39014, 15465, 26810, 38099, 41191, 32935, 30285, 16141, 3023, 65250, 12749, 35098, 13147, 9023, 2122, 23429, 3021, 76154, 26742, 25972, 32634, 13064, 27888, 14852, 36641, 44748, 71024, 16317, 5595, 38677, 26287, 10207, 44775, 3622, 81212, 36746, 26102, 40813, 6899, 26804, 15877, 37341, 4651, 25380, 14577, 35871, 35949, 65008, 31365, 41919, 49225, 28698, 26809, 31247, 38699, 1396, 41829, 19649, 27187, 22574, 1364, 41778, 41566, 13701, 28834, 15020, 38823, 40614, 25784, 42763, 42157, 27781, 23250, 6605, 29485, 12680, 64775, 65739, 40799, 52682, 59714, 31386, 21267, 32744, 49656, 66499, 26811, 20988, 774, 8700, 20814, 22, 30850, 70768, 33291, 41575, 41574, 9426, 17970, 2207, 4439, 8510, 21706, 45143, 9555, 1767, 7675, 22973, 66500, 30849, 13467, 9328, 45070, 11635, 69420, 44723, 13772, 56571, 7463, 13390, 25025, 21714, 35243, 35276, 59499, 2641, 13475, 316, 2108, 56952, 19032, 26660, 8824, 6391, 76073, 11639, 42127, 2799, 19693, 5196, 69396, 15916, 23509, 39905, 15732, 33013, 66074, 64867, 25349, 2110, 27165, 7945, 28077, 24737, 325, 26806, 3734, 30551, 26286, 18329, 34149, 33497, 18464, 59133, 17617, 49488, 32079, 42818, 20172, 44550, 17286, 35515, 4859, 37661, 24157, 17225, 38128, 16375, 3593, 35868, 41307, 38511, 59500, 27361, 6971, 65555, 2754, 42787, 24049, 69397, 7642, 1232, 23418, 24551, 56319, 11033, 49089, 13267, 22694, 41972, 8186, 19066, 1617, 39920, 26417, 3227, 37793, 11637, 24835, 9620, 19956, 8885, 5658, 11817, 31351, 2355, 37612, 16894, 39570, 15946, 11480, 32961, 3837)
    AND post_id IN (
        SELECT post_id 
        FROM post_has_privacy 
        WHERE is_private = 0
    )
GROUP BY image_id 
ORDER BY sequence DESC;                     

这是EXPLAIN 声明:


  "query_block": 
    "select_id": 1,
    "cost_info": 
      "query_cost": "9480.52"
    ,
    "ordering_operation": 
      "using_temporary_table": true,
      "using_filesort": true,
      "grouping_operation": 
        "using_filesort": false,
        "nested_loop": [
          
            "table": 
              "table_name": "image_has_posts",
              "access_type": "range",
              "possible_keys": [
                "PRIMARY",
                "index_on_timestamp"
              ],
              "key": "PRIMARY",
              "used_key_parts": [
                "image_id"
              ],
              "key_length": "4",
              "rows_examined_per_scan": 5628,
              "rows_produced_per_join": 5628,
              "filtered": "100.00",
              "cost_info": 
                "read_cost": "1601.32",
                "eval_cost": "1125.60",
                "prefix_cost": "2726.92",
                "data_read_per_join": "131K"
              ,
              "used_columns": [
                "image_id",
                "post_id",
                "sequence"
              ],
              "attached_condition": "(`db`.`image_has_posts`.`image_id` in (40334,48848,8993,32740,39664,5701,53308,1001,1230,24732,12341,25777,94,56560,31853,17884,16591,38522,29450,31360,12025,17799,17488,4917,23317,9488,65885,65175,16027,32138,32056,56434,2009,30706,36260,28985,793,17146,22163,25433,65046,56517,3008,34893,34867,69689,31359,19366,3338,29484,49566,70479,6415,19287,812,12677,27023,17275,22072,13930,25153,43221,35516,14346,6514,28791,21220,39667,24438,3486,76252,26890,30905,3579,69546,4674,19159,21693,36639,1687,65678,9250,34009,36514,13014,41571,6454,35857,9804,66333,16500,38051,16759,39944,27128,49703,33347,22720,69,17515,5385,14000,8418,6317,1397,7340,3159,31581,311,30628,9919,24229,39407,33572,69327,28745,32309,2334,3988,13243,53287,16034,29050,12818,76190,2463,18207,703,15259,31352,8006,14117,44806,22717,24661,15841,6754,32885,9628,7107,17055,49515,66406,902,16878,14841,10932,23521,8825,17754,2118,36839,53234,39556,9363,32840,37663,9482,24228,66371,49603,29614,12066,37190,12389,28775,7200,7187,29489,28055,75902,36348,25535,15696,75929,10172,7186,5218,66118,65368,15337,2601,23819,10870,24120,4165,23426,4109,17581,5102,23911,7263,15024,31357,7193,25665,8691,3791,35934,11423,10726,7595,9381,17267,15641,16628,56870,25617,66436,12370,27873,9514,59397,26807,64879,11301,22795,29274,29007,13281,27985,34630,30844,40127,66254,56886,27887,23623,30851,23897,11672,11501,41153,31660,37927,6710,44566,9817,39110,65886,186,15385,3505,17610,14921,10637,39014,15465,26810,38099,41191,32935,30285,16141,3023,65250,12749,35098,13147,9023,2122,23429,3021,76154,26742,25972,32634,13064,27888,14852,36641,44748,71024,16317,5595,38677,26287,10207,44775,3622,81212,36746,26102,40813,6899,26804,15877,37341,4651,25380,14577,35871,35949,65008,31365,41919,49225,28698,26809,31247,38699,1396,41829,19649,27187,22574,1364,41778,41566,13701,28834,15020,38823,40614,25784,42763,42157,27781,23250,6605,29485,12680,64775,65739,40799,52682,59714,31386,21267,32744,49656,66499,26811,20988,774,8700,20814,22,30850,70768,33291,41575,41574,9426,17970,2207,4439,8510,21706,45143,9555,1767,7675,22973,66500,30849,13467,9328,45070,11635,69420,44723,13772,56571,7463,13390,25025,21714,35243,35276,59499,2641,13475,316,2108,56952,19032,26660,8824,6391,76073,11639,42127,2799,19693,5196,69396,15916,23509,39905,15732,33013,66074,64867,25349,2110,27165,7945,28077,24737,325,26806,3734,30551,26286,18329,34149,33497,18464,59133,17617,49488,32079,42818,20172,44550,17286,35515,4859,37661,24157,17225,38128,16375,3593,35868,41307,38511,59500,27361,6971,65555,2754,42787,24049,69397,7642,1232,23418,24551,56319,11033,49089,13267,22694,41972,8186,19066,1617,39920,26417,3227,37793,11637,24835,9620,19956,8885,5658,11817,31351,2355,37612,16894,39570,15946,11480,32961,3837))"
            
          ,
          
            "table": 
              "table_name": "post_has_privacy",
              "access_type": "eq_ref",
              "possible_keys": [
                "PRIMARY"
              ],
              "key": "PRIMARY",
              "used_key_parts": [
                "post_id"
              ],
              "key_length": "8",
              "ref": [
                "db.image_has_posts.post_id"
              ],
              "rows_examined_per_scan": 1,
              "rows_produced_per_join": 562,
              "filtered": "10.00",
              "cost_info": 
                "read_cost": "5628.00",
                "eval_cost": "112.56",
                "prefix_cost": "9480.52",
                "data_read_per_join": "8K"
              ,
              "used_columns": [
                "post_id",
                "is_private"
              ],
              "attached_condition": "(`db`.`post_has_privacy`.`is_private` = 0)"
            
          
        ]
      
    
  

【问题讨论】:

您的外部GROUP BY 查询无效。 这个查询实际运行了,哪一部分会无效? 那个GROUP BY的目的是什么? group by 用于对多个相同的image_id进行分组。 image_has_posts 是一个一对多的映射表(一个图像可以分配给多个帖子)。 group by 消除了为查询返回的多条记录。这个想法是只返回一个post_id,但也首先返回最旧的记录并丢弃具有相同image_id的多个记录。 "结果排序...然后分组" -- GROUPing 发生在之前 ORDERing 【参考方案1】:

不允许连接的严格数据库结构(面向未来的扩展)。

我不同意这一点。

在许多情况下,IN ( SELECT ... ) 的性能比等效的 JOIN 更差。

我刚刚在另一种情况下证明EXISTS ( SELECT ... ) 比等效的JOIN 慢。

“派生表”有其自身的问题。

非规范化是避免JOINs 的一种方法,但这也可能导致意外的性能问题。

使用一个SELECT 获取id,然后使用第二个SELECT 获取有用的信息几乎可以保证花费更长的时间——比单个SELECTJOIN 的时间要多一倍。这是因为每个 SQL 语句的开销。

我并不是说当前查询的任何反 JOIN 方面都有问题。请为当前查询提供EXPLAIN FORMAT=JSON SELECT ...。通过EXPLAIN,我们或许能够判断出需要多少排序以及子查询是否像我担心的那样糟糕。

让这个查询变慢的原因是

GROUP BY 一件事,然后ORDER BY 另一件事。这迫使至少一种,可能是两种。 客户端将如何处理超过 1000 行的输出? 行会发生什么;客户是否转身并获取图像的链接?还是分页? 一对多映射不应使用额外的表,而应通过一个表上指向另一个表的指针来体现。

更多...

EXPLAIN 看起来(我认为)与等效的 JOIN 完全相同。

IN 查找似乎已变成表扫描,可能只查看表的(大)块。

请尝试其他形式的“解释”:EXPLAIN FORMAT=JSON SELECT ...

请提供提供 ID 的查询以及将使用结果 ID 的查询。让我们尝试构建一个执行所有 3 个步骤的查询,看看它的运行速度。

【讨论】:

我已经用EXPLAIN 输出更新了问题。 我添加了一些“更多”。 我已经用 EXPLAIN 中的 JSON 格式更新了问题。查询实际上已经在描述中,并且这些 ID 来自无法使用 Mysql 的其他地方,这就是我没有包含该代码的原因。使用此优化所需的一切现在都应该成为问题。唯一缺少的是数据,而且数据量很大,因此需要重新生成以进行测试。 好的,只有一种。我认为EXPLAIN 表示IN(SELECT..) 变成了JOIN,这可能是最优的。我会从没有一个单独的表开始 1:many。 @stwhite - 啊哈! Pinterest 是“分片”,这意味着一些数据在每个物理服务器中。这使得JOIN 基本上不可能,除非在有限的情况下。我怀疑您的 9M 行位于一台机器上。如果您希望达到 9B 行,那么分片可能是必要的。当你接近那个时,我们可以讨论如何分片并且仍然允许一些JOINing等。

以上是关于加速使用 Group By 和 Order By 的多表 Mysql 查询的主要内容,如果未能解决你的问题,请参考以下文章

sql语句执行顺序之group by、order by

sql中group by和order by的区别

order by 和 group by 的区别

GROUP BY 和 ORDER BY一起使用时,要注意的问题!

一起使用 ORDER BY 和 GROUP BY

GROUP BY 和 ORDER BY一起使用