聚合查询中的性能更新

Posted 2023-04-19

技术标签:

【中文标题】聚合查询中的性能更新【英文标题】：Performance update in Aggregate query 【发布时间】：2013-05-16 17:38:38 【问题描述】：

我想提高以下聚合查询的性能。

在有 3000 万条记录的 T_Search_Detail 上，下面的 Query 需要 12 秒才能执行？能不能写得更好，有什么提高性能的建议吗？

解释计划：

Execution Plan
----------------------------------------------------------
Plan hash value: 651646209
--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |     3 |    42 | 27948   (1)| 00:05:36 |
|   1 |  SORT GROUP BY                 |                 |     3 |    42 | 27948   (1)| 00:05:36 |
|   2 |   VIEW                         |                 |    56 |   784 | 27947   (1)| 00:05:36 |
|   3 |    HASH GROUP BY               |                 |    56 |  1344 | 27947   (1)| 00:05:36 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| T_SEARCH_DETAIL |   898 | 21552 | 27946   (1)| 00:05:36 |
|*  5 |      INDEX RANGE SCAN          | INDEX_CREATE_DT |  1254K|       |  3451   (1)| 00:00:42 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("TSD"."MATCH_SOURCE" IS NOT NULL AND "TSD"."MATCH_TYPE" IS NOT NULL AND
              "TSD"."MATCH_TYPE" LIKE '%Exact%')
   5 - access("TSD"."CREATE_DT">=TO_DATE(' 2012-12-11 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "TSD"."CREATE_DT"<TO_DATE(' 2013-04-23 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

表 DDL：

此查询使用两个表 T_Search 和 T_Search_detail，其中 FOREIGN_KEY 作为 match_id。

SELECT   ms,
         SUM(ct)
FROM     ( SELECT  tsd.match_source    ms,
                  tsd.match_type       mt,
                  COUNT(tsd.search_id) ct
         FROM     t_search ts,
                  t_search_detail tsd
         WHERE    tsd.match_source IS NOT NULL
         AND      tsd.match_type   IS NOT NULL
         AND      ts.match_id                = tsd.match_id
         AND      tsd.match_type          LIKE '%Exact%'
         AND
                  (
                           tsd.create_dt >= to_date('12/11/2012', 'MM/DD/YYYY')
                  AND      tsd.create_dt  < (to_date('04/22/2013', 'MM/DD/YYYY')+1)
                  )
         GROUP BY tsd.match_source,
                  tsd.match_type
         )
GROUP BY ms
ORDER BY ms DESC

【问题讨论】：

您至少需要提供解释计划和表格定义，否则没有人能够智能地帮助您。您能提供一些性能指标吗？时间、记忆等还请包括表的大小和有关索引的信息。 @Ben 我已经添加了这些，请检查 @RyanGates 执行上述查询的时间为 2.5 秒，对于 194638 条记录，其他细节在执行计划中。 【参考方案1】：

如果值得通过索引访问“%Exact%”行，那么您可以使用基于函数的索引来实现：

create index ... on ... (case Coalesce(InStr(match_type,'Exact'),0) when 0 then null else 1 end)

这将只在索引中包含匹配类型包含字符串“Exact”的行，您将查询：

where ... and
      (case Coalecse(InStr(match_type,'Exact'),0) when 0 then null else 1 end) = 1

您可以将“精确”搜索与日期索引结合起来：

create index ... on ... (case Coalesce(InStr(match_type,'Exact'),0) when 0 then null else create_dt end)

...它只会为 match_type 包括“Exact”的行索引 create_dt。

你会查询：

case Coalesce(InStr(match_type,'Exact'),0) when 0 then null else create_dt end >= to_date('12/11/2012', 'MM/DD/YYYY') and
case Coalesce(InStr(match_type,'Exact'),0) when 0 then null else create_dt end  < (to_date('04/22/2013', 'MM/DD/YYYY')+1)

【讨论】：

【参考方案2】：

首先，您的工作不需要两个级别的聚合。您可以通过match_source 聚合并计算匹配记录的数量。

这里是查询的简化版本，使用正确的连接语法：

SELECT  tsd.match_source ms, COUNT(tsd.search_id) ct
FROM t_search ts join
     t_search_detail tsd
     on ts.match_id = tsd.match_id
WHERE tsd.match_source IS NOT NULL AND
      tsd.match_type   IS NOT NULL AND
      tsd.match_type LIKE '%Exact%' and
      tsd.create_dt >= to_date('12/11/2012', 'MM/DD/YYYY') and
      tsd.create_dt  < (to_date('04/22/2013', 'MM/DD/YYYY')+1)
GROUP BY tsd.match_source;

接下来，表t_search 似乎根本没有被使用。它可能用于过滤，也可能会增加行数。但是，假设t_search_detail 中的所有内容都与t_search 中的一行完全匹配，那么您有：

SELECT  tsd.match_source ms, COUNT(tsd.search_id) ct
FROM t_search_detail tsd
WHERE tsd.match_source IS NOT NULL AND
      tsd.match_type   IS NOT NULL AND
      tsd.match_type LIKE '%Exact%' and
      tsd.create_dt >= to_date('12/11/2012', 'MM/DD/YYYY') and
      tsd.create_dt  < (to_date('04/22/2013', 'MM/DD/YYYY')+1)
GROUP BY tsd.match_source;

这样，您可能会使用诸如t_search_detail(match_source, match_type, create_dt) 之类的索引来提高性能：

CREATE INDEX tsearchdetail_matchsource_matchtype_createdt
         ON t_search_detail(match_source, match_type, create_dt);

看来，这个查询将不得不搜索与日期匹配的所有记录。你能将'%EXACT%' 形式的match_type 列表扩展为有限列表吗？如果是这样，则将where 的那一行更改为：

where . . . and match_type in (<list of exact match types>) . . .

那么你想要一个(match_type, create_dt) 的索引。但是，只有在大多数匹配类型不是“精确”的情况下，这才会显着提高性能——您可能只是处于必须处理大量记录的位置，这可能需要几秒钟。

【讨论】：

感谢 Gordon Linoff，这些列上已经存在索引，还有其他想法吗？ @Narayan 。 . .答案非常具体，即按该顺序具有一个包含三列的索引。在每列上设置单独的索引不会带来太多好处。为什么会这样？是什么让订单特别，请解释一下 @Narayan 。 . .我认为 mysql 文档对如何使用索引有很好的解释 (dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html)。尽管存在差异，但所有数据库使用索引的方式有很多相似之处。谢谢，我赞成你的回答。但在我的 QA 环境中，有 3000 万条记录，执行需要 12-13 秒，有什么想法吗？【参考方案3】：

除了 SQL 调整本身之外，在查询执行期间（以及之后的 v$sql_workarea）监控 v$sql_workarea_active 以查看您的查询是否使用临时表空间进行存储，如果是，您是在执行单通道还是多通道操作。

多通道操作是性能杀手，您需要确保调整内存大小以避免这些操作，最好避免单通道操作。为此，您可能必须将会话切换到手动内存管理，或者通过各自的缓存顾问视图查看 PGA 和 SGA 大小的总体分配。

http://docs.oracle.com/cd/E11882_01/server.112/e16638/memory.htm#i49320

【讨论】：

以上是关于聚合查询中的性能更新的主要内容，如果未能解决你的问题，请参考以下文章

如何在单个查询中使用联接和聚合函数更新表中的多行

术语聚合性能高基数

提高 PostgresSQL 聚合查询性能

MongoDB 聚合中的多个 $project 阶段是不是会影响性能

MongoDBMongoDB 性能优化 - BI查询聚合

优化 MongoDB 聚合查询性能