oracle中in和exist的区别

Posted 2023-03-21

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了oracle中in和exist的区别相关的知识，希望对你有一定的参考价值。

in和exist的主要区别体现在对sql执行计划的影响上。

传统上认为，如果子查询的条件更具选择性（selective），就用in；而如果父查询（外层查询）的条件更具选择性（selective），就用exist。

具体的内容可以参考以下oracle原厂的手册，oracle的原厂手册都是英文版的。

另外需要特别注意的是，in和exist的区别只在10.2.0.3及以前的版本中存在；而10.2.0.4及以后的版本中，in和exist的效果是完全一样的，手册中也删除了有关二者区别的说明。

以下是对手册的引用：

In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.

Sometimes, Oracle can rewrite a subquery when used with an IN clause to take advantage of selectivity specified in the subquery. This is most beneficial when the most selective filter appears in the subquery and there are indexes on the join columns. Conversely, using EXISTS is beneficial when the most selective filter is in the parent query. This allows the selective predicates in the parent query to be applied before filtering the rows against the EXISTS criteria.

参考技术A in 和 exists区别

in 是把外表和内表作hash join，而exists是对外表作loop，每次loop再对内表进行查询。

一直以来认为exists比in效率高的说法是不准确的。
如果查询的两个表大小相当，那么用in和exists差别不大。

如果两个表中一个较小，一个是大表，则子查询表大的用exists，子查询表小的用in：

例如：表A（小表），表B（大表）
1：
select * from A where cc in (select cc from B)
效率低，用到了A表上cc列的索引；
select * from A where exists(select cc from B where cc=A.cc)
效率高，用到了B表上cc列的索引。
相反的
2：
select * from B where cc in (select cc from A)
效率高，用到了B表上cc列的索引；
select * from B where exists(select cc from A where cc=B.cc)
效率低，用到了A表上cc列的索引。
带in的关联子查询是多余的，因为in子句和子查询中相关的操作的功能是一样的。如：
select staff_name from staff_member where staff_id in
(select staff_id from staff_func where staff_member.staff_id=staff_func.staff_id);

为非关联子查询指定exists子句是不适当的，因为这样会产生笛卡乘积。如：
select staff_name from staff_member where staff_id
exists (select staff_id from staff_func);

not in 和not exists

如果查询语句使用了not in 那么内外表都进行全表扫描，没有用到索引；
而not extsts 的子查询依然能用到表上的索引。
所以无论哪个表大，用not exists都比not in要快。

尽量不要使用not in子句。使用minus 子句都比not in 子句快，虽然使用minus子句要进行两次查询：
select staff_name from staff_member where staff_id in (select staff_id from staff_member minus select staff_id from staff_func where func_id like '81%');

in 与 "=" 的区别

select name from student where name in ('zhang','wang','li','zhao');

与

select name from student where name='zhang' or name='li' or name='wang' or name='zhao'

的结果是相同的。参考技术B 在SQL中，简单的说exist判断查询结果是否为空集，而in是两个集合之间的包含关系。比如name in ('zhang','wang','li','zhao'); 参考技术C 刚刚总结了下。不仅仅是判断是否为空集的原因：
http://hi.baidu.com/loveyurui/blog/item/07106d836e72e2c9bd3e1e58.html

oracle中in，not in和exists，not exists之间的区别

1、关于在 Oracle8i 时代中in和exists的区别

这里有条SQL语句：select * from A where id in(select id from B)
以上查询使用了in语句,in()只执行一次,它查出B表中的所有id字段并缓存起来.之后,检查A表的id是否与B表中的id相等,如果相等则将A表的记录加入结果集中,直到遍历完A表的所有记录;
它的查询过程类似于以下过程
List resultSet=[];
Array A=(select * from A);
Array B=(select id from B);
for(int i=0;i<A.length;i++)
for(int j=0;j<B.length;j++)
if(A[i].id==B[j].id)
resultSet.add(A[i]);
break;

return resultSet;
可以看出,当B表数据较大时不适合使用in(),因为它会B表数据全部遍历一次.
如:A表有10000条记录,B表有1000000条记录,那么最多有可能遍历10000*1000000次,效率很差.
再如:A表有10000条记录,B表有100条记录,那么最多有可能遍历10000*100次,遍历次数大大减少,效率大大提升.
结论1:in()适合B表比A表数据小的情况

这里还有一条SQL语句： select a.* from A a where exists(select 1 from B b where a.id=b.id)
以上查询使用了exists语句,exists()会执行A.length次,它并不缓存exists()结果集,因为exists()结果集的内容并不重要,重要的是结果集中是否有记录,如果有则返回true,没有则返回false.
它的查询过程类似于以下过程

List resultSet=[];
Array A=(select * from A)

for(int i=0;i<A.length;i++)
if(exists(A[i].id) //执行select 1 from B b where b.id=a.id是否有记录返回
resultSet.add(A[i]);

return resultSet;
结论2:exists()适合B表比A表数据大的情况
当B表比A表数据大时适合使用exists(),因为它没有那么遍历操作,只需要再执行一次查询就行.
如:A表有10000条记录,B表有1000000条记录,那么exists()会执行10000次去判断A表中的id是否与B表中的id相等.
如:A表有10000条记录,B表有100000000条记录,那么exists()还是执行10000次,因为它只执行A.length次,可见B表数据越多,越适合exists()发挥效果.
再如:A表有10000条记录,B表有100条记录,那么exists()还是执行10000次,还不如使用in()遍历10000*100次,因为in()是在内存里遍历比较,而exists()需要查询数据库,我们都知道查询数据库所消耗的性能更高,而内存比较很快.

当A表数据与B表数据一样大时,in与exists效率差不多,可任选一个使用.
In适合内外表都很大的情况，exists适合外表结果集很小的情况。

In和exists对比：
若子查询结果集比较小，优先使用in，若外层查询比子查询小，优先使
用exists。因为若用in，则Oracle会优先查询子查询，然后匹配外层查询，
若使用exists，则oracle会优先查询外层表，然后再与内层表匹配。最优化
匹配原则，拿最小记录匹配大记录

2、关于在 Oracle8i 之后时代中in和exists的区别
in 是把外表和内表作hash join，而exists是对外表作loop，每次loop再对内表进行查询。

一直以来认为exists比in效率高的说法是不准确的。
如果查询的两个表大小相当，那么用in和exists差别不大。
如果两个表中一个较小，一个是大表，则子查询表大的用exists，子查询表小的用in：

例如：表A（小表），表B（大表）
1：
select * from A where cc in (select cc from B)
效率低，用到了A表上cc列的索引；
select * from A where exists(select cc from B where cc=A.cc)
效率高，用到了B表上cc列的索引。
相反的
2：
select * from B where cc in (select cc from A)
效率高，用到了B表上cc列的索引；
select * from B where exists(select cc from A where cc=B.cc)
效率低，用到了A表上cc列的索引。

带in的关联子查询是多余的，因为in子句和子查询中相关的操作的功能是一样的。如：
select staff_name from staff_member where staff_id in
(select staff_id from staff_func where staff_member.staff_id=staff_func.staff_id);

为非关联子查询指定exists子句是不适当的，因为这样会产生笛卡乘积。如：
select staff_name from staff_member where staff_id
exists (select staff_id from staff_func);
not in 和not exists
如果查询语句使用了not in 那么内外表都进行全表扫描，没有用到索引；
而not extsts 的子查询依然能用到表上的索引。
所以无论哪个表大，用not exists都比not in要快。
尽量不要使用not in子句。使用minus 子句都比not in 子句快，虽然使用minus子句要进行两次查询：
select staff_name from staff_member where staff_id in (select staff_id from staff_member minus select staff_id from staff_func where func_id like \'81%\');
in 与 "=" 的区别
select name from student where name in (\'zhang\',\'wang\',\'li\',\'zhao\');
与
select name from student where name=\'zhang\' or name=\'li\' or name=\'wang\' or name=\'zhao\'
的结果是相同的。

3、关于在 Oracle8i 之后时代中in和exists的区别

在ORACLE 11G大行其道的今天，还有很多人受早期版本的影响，记住一些既定的规则，
1.子查询结果集小，用IN
2.外表小，子查询表大，用EXISTS
这是完全错误的观点。在8i时代，这经常是正确的，但是现在已经11G了，马上12C就要面世了。其实在ORACLE 9i CBO就已经优化了IN,EXISTS的区别，ORACLE优化器有个查询转换器，很多SQL虽然写法不同，但是ORACLE优化器会根据既定规则进行查询重写，重写为优化器觉得效率最高的SQL，所以可能SQL写法不同，但是执行计划却是完全一样的。
IN与EXISTS有一点要记住：IN一般是用于非相关子查询，而EXISTS一般用于相关子查询。当然IN也可以用于相关子查询，EXISTS也可以用于非相关子查询。但是这区别很重要，虽然优化器很强大，但是查询转换是有一定的限制的，在EXISTS性能低下，无法进行相关查询转换，比如不能UNNEST SUBQUERY,那么可能我们需要改写SQL，通常可以用IN/JOIN等改写相关子查询，或对于含有OR的SEMI JOIN改为UNION ALL/UNION的形式。
下面就用例子说明一下：

DROP TABLE a;
DROP TABLE b;
CREATE TABLE a AS SELECT * FROM hr.employees;
CREATE TABLE b AS SELECT * FROM hr.employees;
--反复插入，构造20万行+
INSERT INTO a SELECT * FROM a;
INSERT INTO b SELECT * FROM a;
COMMIT；
BEGIN
dbms_stats.gather_table_stats(ownname => USER,tabname => \'a\',estimate_percent => 100,cascade => TRUE);
dbms_stats.gather_table_stats(ownname => USER,tabname => \'b\',estimate_percent => 100,cascade => TRUE);
END;
/

1.测试IN,EXISTS在简单查询中，是等价的
SQL> set autotrace traceonly exp
SQL> SELECT * FROM a
2 WHERE EXISTS(
3 SELECT 1 FROM b WHERE a.employee_id=b.employee_id);

执行计划
----------------------------------------------------------
Plan hash value: 2317816356

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 217K| 15M| 1375 (2)| 00:00:17 |
|* 1 | HASH JOIN | | 217K| 15M| 1375 (2)| 00:00:17 |
| 2 | SORT UNIQUE | | 217K| 848K| 126 (1)| 00:00:02 |
| 3 | INDEX FAST FULL SCAN | IDX2_B | 217K| 848K| 126 (1)| 00:00:02 |
| 4 | TABLE ACCESS FULL | A | 217K| 14M| 620 (1)| 00:00:08 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("A"."EMPLOYEE_ID"="B"."EMPLOYEE_ID")

SQL> SELECT * FROM a
2 WHERE a.employee_id IN (SELECT b.employee_id FROM b);

执行计划
----------------------------------------------------------
Plan hash value: 2317816356

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 217K | 15M| 1375 (2) | 00:00:17 |
|* 1 | HASH JOIN | | 217K | 15M| 1375 (2) | 00:00:17 |
| 2 | SORT UNIQUE | | 217K | 848K| 126 (1) | 00:00:02 |
| 3 | INDEX FAST FULL SCAN | IDX2_B | 217K | 848K| 126 (1) | 00:00:02 |
| 4 | TABLE ACCESS FULL | A | 217K | 14M| 620 (1) | 00:00:08 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("A"."EMPLOYEE_ID"="B"."EMPLOYEE_ID")

可以看出，两个计划完全没有区别。类似于ORACLE查询重写为：
SELECT a.* FROM a,(SELECT DISTINCT b.employee_id FROM b) b1 WHERE a.employee_id=b1.employee_id;

结语：讨论SQL语句：select * from A where id in (1,2,3,4,5,6,7,8,9) 中在Oracle8i 版本之后的Oracle数据库中性能是一样的。参考技术A 如果in的数据量少，比exists更有效率；如果in的数量大，则exists更有效率

本回答被提问者采纳

以上是关于oracle中in和exist的区别的主要内容，如果未能解决你的问题，请参考以下文章