奇怪的查询性能结果-greenplum 5.0中'in clause'的不同表达式数

Posted 2023-03-31

技术标签:

【中文标题】奇怪的查询性能结果-greenplum 5.0中\'in clause\'的不同表达式数【英文标题】：strange query perf result - different expression number of 'in clause' in greenplum 5.0奇怪的查询性能结果-greenplum 5.0中'in clause'的不同表达式数 【发布时间】：2019-10-31 06:45:17 【问题描述】：

我注意到在 greenplum 5.0 中使用“in 子句”时出现了奇怪的结果。

当'in Clause'的表达式数 25时，查询明显快（比数= 25）。为什么会这样？

我解释了查询，使用新/旧优化器运行，输出是相同的。这里是查询sql并解释结果。

查询 1 - 26 表达式编号

sql:

select * from table1 
where column1 in ('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26')

查询时间：0.8s ~ 0.9s

解释：

Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..481.59 rows=2021 width=1069)
  ->  Table Scan on table1 (cost=0.00..475.60 rows=253 width=1069)
        Filter: column1 = ANY ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25'::text[])
Settings:  optimizer=on
Optimizer status: PQO version 2.42.0

解释分析：

Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..481.53 rows=2003 width=1064)
  Rows out:  0 rows at destination with 52 ms to end, start offset by 0.477 ms.
  ->  Table Scan on table1 (cost=0.00..475.63 rows=251 width=1064)
        Filter: column1 = ANY ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26'::text[])
        Rows out:  0 rows (seg0) with 51 ms to end, start offset by -358627 ms.
Slice statistics:
  (slice0)    Executor memory: 437K bytes.
  (slice1)    Executor memory: 259K bytes avg x 8 workers, 281K bytes max (seg7).
Statement statistics:
  Memory used: 262144K bytes
Settings:  optimizer=on
Optimizer status: PQO version 2.42.0
Total runtime: 53.107 ms

查询 2 - 25 表达式编号

sql:

select * from table1 
where column1 in ('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25')

查询时间：1.2s ~ 1.5s

解释：

Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..481.59 rows=2021 width=1069)
  ->  Table Scan on table1 (cost=0.00..475.60 rows=253 width=1069)
        Filter: column1 = ANY ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25'::text[])
Settings:  optimizer=on
Optimizer status: PQO version 2.42.0

解释分析：

Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..481.53 rows=2003 width=1064)
  Rows out:  0 rows at destination with 60 ms to end, start offset by 0.517 ms.
  ->  Table Scan on table1 (cost=0.00..475.63 rows=251 width=1064)
        Filter: column1 = ANY ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25'::text[])
        Rows out:  0 rows (seg0) with 59 ms to end, start offset by -155783 ms.
Slice statistics:
  (slice0)    Executor memory: 437K bytes.
  (slice1)    Executor memory: 191K bytes avg x 8 workers, 191K bytes max (seg0).
Statement statistics:
  Memory used: 262144K bytes
Settings:  optimizer=on
Optimizer status: PQO version 2.42.0
Total runtime: 60.584 ms

gp 在 3 个 vm、1 个 master 和 2 个 segment 中运行，每个 segment 有 4 个数据目录。

table1 有 500,000 行 50 列，主键和分布键是另一列，在 uuid 中。 column1 不是分发键或主键，只是自然键之一。

【问题讨论】：

【参考方案1】：

您可以运行解释分析来查看计划究竟花费了哪些时间。在这里分享。

【讨论】：

运行时间约为 60 毫秒。我认为数据库级别没有任何区别。

以上是关于奇怪的查询性能结果-greenplum 5.0中'in clause'的不同表达式数的主要内容，如果未能解决你的问题，请参考以下文章