A.COLUMN LIKE B.COLUMN% 关联的优化方法

Posted robinson1988

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了A.COLUMN LIKE B.COLUMN% 关联的优化方法相关的知识,希望对你有一定的参考价值。

现在有个SQL要跑10秒:

SQL> select a0.id,
  2         a1.room_no,
  3         a1.user_name,
  4         a1.user_no,
  5         row_number() over(partition by a0.id order by a1.room_enter_time desc) as fn
  6    from vid_attachment a0
  7   inner join vid_room_log a1
  8      on a0.file_name like a1.room_md5 || '%'
  9   where a0.room_no is null
 10     and a1.room_md5 is not null;

未选定行

已用时间:  00: 00: 10.53

执行计划
----------------------------------------------------------
Plan hash value: 374412539

----------------------------------------------------------------------------------------------
| Id  | Operation           | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                |   728K|   146M|       |   116K  (1)| 00:23:16 |
|   1 |  WINDOW SORT        |                |   728K|   146M|   162M|   116K  (1)| 00:23:16 |
|   2 |   NESTED LOOPS      |                |   728K|   146M|       | 82835   (1)| 00:16:35 |
|*  3 |    TABLE ACCESS FULL| VID_ATTACHMENT |   592 | 74000 |       |   384   (1)| 00:00:05 |
|*  4 |    TABLE ACCESS FULL| VID_ROOM_LOG   |  1231 |   103K|       |   139   (0)| 00:00:02 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("A0"."ROOM_NO" IS NULL)
   4 - filter("A1"."ROOM_MD5" IS NOT NULL AND "A0"."FILE_NAME" LIKE
              "A1"."ROOM_MD5"||'%')


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     305333  consistent gets
       1320  physical reads
          0  redo size
        524  bytes sent via SQL*Net to client
        405  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          0  rows processed

这个SQL两个表关联条件是a0.file_name like a1.room_md5 || '%'

LIKE,INSERT,SUBSTR 等变长模糊匹配,只能走NL,不能走HASH

执行计划中,ID=3 VID_ATTACHMENT过滤之后剩下30091条数据:

SQL> select count(*) from VID_ATTACHMENT where room_no is not null;

  COUNT(*)
----------
     30091

VID_ROOM_LOG 是NL被驱动表,它走的是全表扫描,要被扫描30091次,这就是为啥SQL要跑10秒钟

现在将SQL等价改写:

SQL> select a0.id,
  2         a1.room_no,
  3         a1.user_name,
  4         a1.user_no,
  5         row_number() over(partition by a0.id order by a1.room_enter_time desc) as fn
  6    from (select a.*, b.min_len
  7            from vid_attachment a,
  8                 (select min(length(room_md5)) min_len from vid_room_log) b) a0
  9   inner join (select a.*, min(length(room_md5)) over() min_len
 10                 from vid_room_log a) a1
 11      on a0.file_name like a1.room_md5 || '%'
 12     and substr(a0.file_name, 1, a0.min_len) =
 13         substr(a1.room_md5, 1, a1.min_len)
 14   where a0.room_no is null
 15     and a1.room_md5 is not null;

未选定行

已用时间:  00: 00: 00.07

执行计划
----------------------------------------------------------
Plan hash value: 413666598

----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                |  7288 |  2142K|       |  1053   (1)| 00:00:13 |
|   1 |  WINDOW SORT              |                |  7288 |  2142K|  2344K|  1053   (1)| 00:00:13 |
|*  2 |   HASH JOIN               |                |  7288 |  2142K|       |   577   (1)| 00:00:07 |
|   3 |    NESTED LOOPS           |                |   592 | 81696 |       |   435   (1)| 00:00:06 |
|   4 |     VIEW                  |                |     1 |    13 |       |    51   (0)| 00:00:01 |
|   5 |      SORT AGGREGATE       |                |     1 |    39 |       |            |          |
|   6 |       INDEX FAST FULL SCAN| IDX_ROOMMD5    | 24623 |   937K|       |    51   (0)| 00:00:01 |
|*  7 |     TABLE ACCESS FULL     | VID_ATTACHMENT |   592 | 74000 |       |   384   (1)| 00:00:05 |
|*  8 |    VIEW                   |                | 24623 |  3919K|       |   141   (1)| 00:00:02 |
|   9 |     WINDOW BUFFER         |                | 24623 |  2067K|       |   141   (1)| 00:00:02 |
|  10 |      TABLE ACCESS FULL    | VID_ROOM_LOG   | 24623 |  2067K|       |   141   (1)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(SUBSTR("A"."FILE_NAME",1,INTERNAL_FUNCTION("B"."MIN_LEN"))=SUBSTR("A1"."ROOM_M
              D5",1,INTERNAL_FUNCTION("A1"."MIN_LEN")))
       filter("A"."FILE_NAME" LIKE "A1"."ROOM_MD5"||'%')
   7 - filter("A"."ROOM_NO" IS NULL)
   8 - filter("A1"."ROOM_MD5" IS NOT NULL)


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2017  consistent gets
          0  physical reads
          0  redo size
        524  bytes sent via SQL*Net to client
        405  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          0  rows processed

在原有的关联条件a0.file_name like a1.room_md5 || '%' 上面 再加上 

substr(a0.file_name, 1, a0.min_len) =substr(a1.room_md5, 1, a1.min_len)

让两个表可以走HASH,SQL就可以秒杀了

如果SQL是:

select a0.id,
       a1.room_no,
       a1.user_name,
       a1.user_no,
       row_number() over(partition by a0.id order by a1.room_enter_time desc) as fn
  from vid_attachment a0
 inner join vid_room_log a1
    on a0.file_name like  '%' || a1.room_md5 || '%'
 where a0.room_no is null
   and a1.room_md5 is not null;
   

这种情况无解,无法优化

最后我想说的是,关系型数据库本质就是让你来=值关联的,不是让你来模糊关联的,表设计的时候就应该杜绝模糊关联

 

以上是关于A.COLUMN LIKE B.COLUMN% 关联的优化方法的主要内容,如果未能解决你的问题,请参考以下文章

替代加入命令

tidb如何批量提高update执行效率

sql在哪里包含

一个表上具有不同列的 SQL 连接

如何从相同的数据构造2列并计算比率?

将过滤条件放在 join on 语句中是不是等效? [复制]