oracle 中多表连接如何用

Posted 2023-03-25

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了oracle 中多表连接如何用相关的知识，希望对你有一定的参考价值。

以两表为例：

有以下两张表：

现在要通过deptno字段，在查询中显示emp表中全部内容和dept表中的dname字段。

可用如下语句：

select a.*,b.dname from emp a,dept b where a.deptno=b.deptno;

查询结果：

参考技术A – 左连接通用写法：select * from a left join b on a.id=b.id
– 右连接通用写法：select * from a right join b on a.id=b.id
– 全连接通用写法：select * from a full join b on a.id=b.id
– 左连接Oracle 写法：select * from a,b where a.id=b.id( )
– 右连接Oracle 写法：select * from a,b where a.id ( ) =b.id

多表关联场景下如何用好分区表？

关系型数据库支持的分区表技术，就像双刃剑，如果用得好，可以提高数据检索或删除的效率，如果用得不太对，就可能起到反作用。

之前写过Oracle分区表技术相关的历史文章：

《truncate分区表的操作，会导致全局索引失效？》

《时间间隔分区，及其默认表空间的几个使用场景》

《普通堆表导入为分区表需求》

《一张几亿的分区表，能改名么？》

《非分区表是否可以创建分区索引？》

《interval间隔分区STORE IN参数的作用范围》

EDB数据库分区表的相关历史文章：

《EDB无法删除分区子表的错误》

引自爱可生开源技术社区关于MySQL分区表技术的相关文章：

《MySQL时间分区案例》

《MySQL时间类分区写SQL的一些注意事项》

《MySQL时间分区的实现》

最近看到另外的一篇《第43期：多表关联场景下如何用好分区表》，对多表关联的场景用对分区表，进行了实验论证，值得推荐。

分区表存在的目的就是为了减少每次检索的数据量从而提升整体性能。如何在多表关联场景下合理利用分区表来提升查询性能？

经常有人会问这样的一些问题：我用了分区表，但是查询一点也没有加快，反而更慢了，是什么原因？是不是分区表本身有缺陷？还是我没有理解分区表适合的场景？对于这些个问题，今天用几类典型的查询场景来举例说明。

第一种场景：两表关联，关联键是分区键，但是没有过滤条件。

类似这样：select * from t1 inner join t2 using(id);

这类场景用分区表只会让查询性能更差，并不会加速查询性能。

不用分区表时，表关联数目只有两张；用了分区表，参与表关联的表数目就不仅仅是两张，还有众多表分区，分区数目越多，查询性能越差。

举个简单例子：表t1 为哈希分区表，有1000个分区，记录数50W行。

localhost:ytt>show create table t1\\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
/*!50100 PARTITION BY HASH (`id`)
PARTITIONS 1000 */
1 row in set (0.00 sec)

表t1_no_pt为普通表，为表t1的克隆，但是移除掉表分区，记录数也同样为50W条，

localhost:ytt>show create table t1_no_pt\\G
*************************** 1. row ***************************
       Table: t1_no_pt
Create Table: CREATE TABLE `t1_no_pt` (
  `id` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
1 row in set (0.00 sec)

这两张表在这种场景下的查询性能对比：分区表和普通表关联查询，执行时间为6.76秒，

localhost:ytt>select count(*) from t1_no_pt a inner join t1 b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (6.76 sec)

两张分区表关联查询，执行时间为4.32秒，

localhost:ytt>select count(*) from t1 a inner join t1 b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (4.32 sec)

两张普通表关联查询，执行时间只用了0.87秒，

localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (0.87 sec)

同样的查询，分区表在这样的场景下反而更加糟糕。

第二种场景：两表关联，关联键是分区键，但是有过滤条件。

这里又细分为两种子场景：

1. 过滤条件为分区键

类似这样的查询：select * from t1 inner join t2 using(id) where t1.id = xxx;

这种场景下推荐用分区表。过滤条件为分区键并且为等值查询，最终优化器会定位到某一个固定的表分区来缩小检索记录数，完美适合分区表。

同样，用表t1和表 t1_no_pt 来举个简单例子：

两分区表关联并且过滤条件为分区键，执行时间为0.01秒，

localhost:ytt>select count(*) from t1 a inner join t1 b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
|       81 |
+----------+
1 row in set (0.01 sec)

两普通表关联，同样的条件，执行时间为0.55秒，比两分区表关联慢很多倍，

localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
|       81 |
+----------+
1 row in set (0.55 sec)

用分区表和普通表关联，执行时间0.32秒，介于前两者之间，

localhost:ytt>select count(*) from t1 a inner join t1_no_pt b using(id) where a.id = 19172;
+----------+
| count(*) |
+----------+
|       81 |
+----------+
1 row in set (0.32 sec)

补一个两分区表关联和两普通表关联的执行计划对比，会表现的更加明显：分区表关联成本381.9，扫描行数为280；普通表关联成本249264389.78，扫描行数249125777。此时分区表关联性能提升非常明显，

localhost:ytt>explain  format=tree select count(*) from t1 a inner join t1 b using(id) where a.id = 19172\\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Inner hash join (no condition)  (cost=381.90 rows=280)
        -> Filter: (b.id = 19172)  (cost=1.02 rows=53)
            -> Table scan on b  (cost=1.02 rows=529)
        -> Hash
            -> Filter: (a.id = 19172)  (cost=53.65 rows=53)
                -> Table scan on a  (cost=53.65 rows=529)

1 row in set (0.00 sec)

localhost:ytt>explain  format=tree select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.id = 19172\\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
    -> Inner hash join (no condition)  (cost=249264389.78 rows=249125777)
        -> Filter: (b.id = 19172)  (cost=1.87 rows=49913)
            -> Table scan on b  (cost=1.87 rows=499125)
        -> Hash
            -> Filter: (a.id = 19172)  (cost=50257.25 rows=49913)
                -> Table scan on a  (cost=50257.25 rows=499125)

1 row in set (0.00 sec)

2. 过滤条件非分区键

类似这样的查询：select * from t1 inner join t2 using(id) where t1.r1 = xxx;

这种场景下，分区表非但不会带来性能提升，反而造成性能急剧下降。

依然用表t1和表t1_no_pt来举例：两分区表之间关联，执行时间为6.16秒，

localhost:ytt>select count(*) from t1 a inner join t1 b using(id) where a.r1 = 10;
+----------+
| count(*) |
+----------+
|    50552 |
+----------+
1 row in set (6.16 sec)

两普通表关联，执行时间为0.7秒，反而比分区表快很多，

localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(id) where a.r1 = 10;
+----------+
| count(*) |
+----------+
|    50552 |
+----------+
1 row in set (0.70 sec)

第三种场景：两表关联，关联键非分区键，但是过滤条件是分区键。

对于这样的场景，分区表同样不能带来性能提升。

两分区表关联性能很差，执行时间为6.05秒，

localhost:ytt>select count(*) from t1 a inner join t1 b using(r1) where a.id = 19172;
+----------+
| count(*) |
+----------+
|   225868 |
+----------+
1 row in set (6.05 sec)

两普通表关联性能好很多，执行时间0.54秒，

localhost:ytt>select count(*) from t1_no_pt a inner join t1_no_pt b using(r1) where a.id = 19172;
+----------+
| count(*) |
+----------+
|   225868 |
+----------+
1 row in set (0.54 sec)

既然过滤条件是分区键，可以考虑让分区表和普通表关联。

改下之前的SQL，用过滤好的分区表数据和普通表关联，这样性能比两普通表关联要好些：执行时间为0.39秒，

localhost:ytt>select count(*) from (select  * from t1 a where a.id = 19172) t inner join t1_no_pt b using(r1);
+----------+
| count(*) |
+----------+
|   225868 |
+----------+
1 row in set (0.39 sec)

第四种场景：分区表关联，关联键也是分区键，但是两张分区表分区算法、或者分区数目有差异。

表t2和表t1结构相同，记录数也相同，但是分区数目不一样，表t1有1000个分区，表t2只有50个分区，

localhost:ytt>show create table t2\\G
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (
  `id` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `log_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 
/*!50100 PARTITION BY HASH (`id`)
PARTITIONS 50 */
1 row in set (0.01 sec)

基于此，关联两张分区表：执行时间为6.43秒，

localhost:ytt>select count(*) from t1 a inner join t2 b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (6.43 sec)

同样，关联两张普通表：执行时间1.98秒。执行时间比分区表要快，

localhost:ytt>select count(*) from t1_no_pt a inner join t2_no_pt b using(id);
+----------+
| count(*) |
+----------+
|  1014068 |
+----------+
1 row in set (1.98 sec)

以上性能差异原因在之前的文章中有部分提及，这里不做额外描述。