什么可以替换查询中的 IN 子句以使其更快?

Posted

技术标签:

【中文标题】什么可以替换查询中的 IN 子句以使其更快?【英文标题】:What can replace IN Clause in query to make it faster? 【发布时间】:2020-07-27 09:30:17 【问题描述】:

我想从 oracle 中获取所有具有相同 MANUF_PARTMATERIAL 的记录。我首先将MANUF_PARTMATERIAL 列分组,然后使用COUNT 检查重复。

SELECT MANUF_PART , COUNT(MANUF_PART), MATERIAL, COUNT(MATERIAL) 
FROM (
    SELECT * 
    FROM IR_MPN 
    WHERE SCALE_QTY = MOQ
)
GROUP BY MANUF_PART, MATERIAL 
HAVING  COUNT(MANUF_PART) > 1 
AND COUNT(MATERIAL) > 1

得到结果后,我想做的是:

Select * from table where MANUF_PART in (subquery) and MATERIAL in (subquery) 

这是我的查询,它是可行的。

SELECT *
FROM IR_MPN
WHERE MANUF_PART IN
(
    SELECT MANUF_PART
    FROM
    (
        SELECT MANUF_PART, COUNT(MANUF_PART), MATERIAL, COUNT(MATERIAL)
        FROM
        (
            SELECT *
            FROM IR_MPN
            WHERE SCALE_QTY = MOQ
        )
        GROUP BY MANUF_PART, MATERIAL
        HAVING  COUNT(MANUF_PART) > 1 AND COUNT(MATERIAL) > 1
    )
)
AND MATERIAL IN
(
    SELECT MATERIAL
    FROM
    (
        SELECT MANUF_PART, COUNT(MANUF_PART), MATERIAL, COUNT(MATERIAL)
        FROM
        (
            SELECT *
            FROM IR_MPN
            WHERE SCALE_QTY = MOQ
        )
        GROUP BY MANUF_PART, MATERIAL
        HAVING  COUNT(MANUF_PART) > 1 AND COUNT(MATERIAL) > 1
    )
)
AND SCALE_QTY != '0'
ORDER BY MANUF_PART

现在我的问题是,当我对 60,000 条记录使用此查询时,大约需要 1 分钟(慢),因为我在此查询中使用了 2 In 子句。 我曾尝试 EXISTS(快得多),但它没有给我想要的结果。

我怎样才能让它更快?

【问题讨论】:

您可以在表中为where中使用的字段添加索引,还可以检查子查询在必要时添加索引 请显示 DDL 和 EXPLAIN - 否则我们只是在猜测 【参考方案1】:

最简单的快速重写是这样的:

SELECT * FROM IR_MPN
WHERE (MANUF_PART,MATERIAL) IN
(
    SELECT MANUF_PART, MATERIAL
    FROM IR_MPN
    WHERE SCALE_QTY = MOQ
    GROUP BY MANUF_PART, MATERIAL
    HAVING COUNT(MANUF_PART) > 1 AND
           COUNT(MATERIAL) > 1
) AND 
SCALE_QTY != '0'
ORDER BY MANUF_PAR;

但正确的方法是使用解析函数。

SELECT *
FROM
(
    SELECT IR_MPN.*,
           COUNT(MANUF_PART) OVER (PARTITION BY MANUF_PART, MATERIAL) AS COUNT_MANUF_PART,
           COUNT(MATERIAL) OVER (PARTITION BY MANUF_PART, MATERIAL) AS COUNT_MATERIAL
    FROM IR_MPN
    WHERE SCALE_QTY = MOQ
)
WHERE COUNT_MANUF_PART>1 AND
      COUNT_MATERIAL>1 AND
      SCALE_QTY != '0';

【讨论】:

【参考方案2】:

也许是WITH 因式分解子句?

with dup as
  (select manuf_part,
          material,
          count(*)
   from ir_pmn
   where scale_qty = 'MOQ'
   group by manuf_part,
            material
   having count(*) > 1
  )
select *
from manuf_part t join dup d on d.manuf_part = t.manuf_part
                            and d.material   = t.material
where t.scale_qty <> '0'
order by t.manuf_part;

顺便说一句,你写的条件:where t.scale_qty &lt;&gt; '0':看起来是“数量”。数量真的是一个字符串吗?不应该是&lt;&gt; 0吗?

【讨论】:

【参考方案3】:

我想从 oracle 中获取所有具有相同 MANUF_PART 和 MATERIAL 的记录。

如果我理解正确,您需要一个解析函数:

select m.*
from (select m.*, count(*) over (partition by manuf_part, material) as cnt
      from IR_MPN m
     ) m
where cnt >= 2
order by manuf_part, material;

【讨论】:

以上是关于什么可以替换查询中的 IN 子句以使其更快?的主要内容,如果未能解决你的问题,请参考以下文章

如何调节变量以使其获取 IN 子句中的所有值 [重复]

如何重写这个嵌套的 SQL 查询以使其更快? [关闭]

在 from 子句 *and* where 子句中添加连接条件使查询更快。为啥?

EXISTS 不适用于 WITH 子句中的子查询

Hadoop Hive 查询中 IN 子句中的大量 Id

删除查询中包含大表的 IN 子句中的子查询性能