奇怪的查询性能结果-greenplum 5.0中'in clause'的不同表达式数

Posted

技术标签:

【中文标题】奇怪的查询性能结果-greenplum 5.0中\'in clause\'的不同表达式数【英文标题】:strange query perf result - different expression number of 'in clause' in greenplum 5.0奇怪的查询性能结果-greenplum 5.0中'in clause'的不同表达式数 【发布时间】:2019-10-31 06:45:17 【问题描述】:

我注意到在 greenplum 5.0 中使用“in 子句”时出现了奇怪的结果。

当'in Clause'的表达式数 25时,查询明显快(比数= 25)。为什么会这样?

我解释了查询,使用新/旧优化器运行,输出是相同的。这里是查询sql并解释结果。

查询 1 - 26 表达式编号

sql:

select * from table1 
where column1 in ('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26')

查询时间:0.8s ~ 0.9s

解释:

Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..481.59 rows=2021 width=1069)
  ->  Table Scan on table1 (cost=0.00..475.60 rows=253 width=1069)
        Filter: column1 = ANY ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25'::text[])
Settings:  optimizer=on
Optimizer status: PQO version 2.42.0

解释分析:

Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..481.53 rows=2003 width=1064)
  Rows out:  0 rows at destination with 52 ms to end, start offset by 0.477 ms.
  ->  Table Scan on table1 (cost=0.00..475.63 rows=251 width=1064)
        Filter: column1 = ANY ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26'::text[])
        Rows out:  0 rows (seg0) with 51 ms to end, start offset by -358627 ms.
Slice statistics:
  (slice0)    Executor memory: 437K bytes.
  (slice1)    Executor memory: 259K bytes avg x 8 workers, 281K bytes max (seg7).
Statement statistics:
  Memory used: 262144K bytes
Settings:  optimizer=on
Optimizer status: PQO version 2.42.0
Total runtime: 53.107 ms

查询 2 - 25 表达式编号

sql:

select * from table1 
where column1 in ('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25')

查询时间:1.2s ~ 1.5s

解释:

Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..481.59 rows=2021 width=1069)
  ->  Table Scan on table1 (cost=0.00..475.60 rows=253 width=1069)
        Filter: column1 = ANY ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25'::text[])
Settings:  optimizer=on
Optimizer status: PQO version 2.42.0

解释分析:

Gather Motion 8:1  (slice1; segments: 8)  (cost=0.00..481.53 rows=2003 width=1064)
  Rows out:  0 rows at destination with 60 ms to end, start offset by 0.517 ms.
  ->  Table Scan on table1 (cost=0.00..475.63 rows=251 width=1064)
        Filter: column1 = ANY ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25'::text[])
        Rows out:  0 rows (seg0) with 59 ms to end, start offset by -155783 ms.
Slice statistics:
  (slice0)    Executor memory: 437K bytes.
  (slice1)    Executor memory: 191K bytes avg x 8 workers, 191K bytes max (seg0).
Statement statistics:
  Memory used: 262144K bytes
Settings:  optimizer=on
Optimizer status: PQO version 2.42.0
Total runtime: 60.584 ms

gp 在 3 个 vm、1 个 master 和 2 个 segment 中运行,每个 segment 有 4 个数据目录。

table1 有 500,000 行 50 列,主键和分布键是另一列,在 uuid 中。 column1 不是分发键或主键,只是自然键之一。

【问题讨论】:

【参考方案1】:

您可以运行解释分析来查看计划究竟花费了哪些时间。在这里分享。

【讨论】:

运行时间约为 60 毫秒。我认为数据库级别没有任何区别。

以上是关于奇怪的查询性能结果-greenplum 5.0中'in clause'的不同表达式数的主要内容,如果未能解决你的问题,请参考以下文章

GreenPlum 5.0的安装

查询优化 PostgreSQL (GreenPlum)。根据排名前 5 位的结果进行分组

Greenplum 集群性能测试

Greenplum 集群性能测试

GreenPlum tidb 性能比较

greenPlum的查询原理