Postgresql 查询的过滤条件中的列上的字符串操作如何影响它选择的计划

Posted 2023-04-15

技术标签:

【中文标题】Postgresql 查询的过滤条件中的列上的字符串操作如何影响它选择的计划【英文标题】：How does a string operation on a column in a filter condition of a Postgresql query have on the plan it chooses 【发布时间】：2021-08-12 05:24:41 【问题描述】：

我正在优化查询，运气不好我尝试了一些方法，它改进了查询，但我无法解释原因。以下是性能较差的查询

with ctedata1 as(
select
    sum(total_visit_count) as total_visit_count,
    sum(sh_visit_count) as sh_visit_count,
    sum(ec_visit_count) as ec_visit_count,
    sum(total_like_count) as total_like_count,
    sum(sh_like_count) as sh_like_count,
    sum(ec_like_count) as ec_like_count,
    sum(total_order_count) as total_order_count,
    sum(sh_order_count) as sh_order_count,
    sum(ec_order_count) as ec_order_count,
    sum(total_sales_amount) as total_sales_amount,
    sum(sh_sales_amount) as sh_sales_amount,
    sum(ec_sales_amount) as ec_sales_amount,
    sum(ec_order_online_count) as ec_order_online_count,
    sum(ec_sales_online_amount) as ec_sales_online_amount,
    sum(ec_order_in_store_count) as ec_order_in_store_count,
    sum(ec_sales_in_store_amount) as ec_sales_in_store_amount,
    table2.im_name,
    table2.brand as kpibrand,
    table2.id_region as kpiregion
from
    table2
where
    deleted_at is null
    and id_region = any('1')
group by
    im_name,
    kpiregion,
    kpibrand ),
ctedata2 as (
select
    ctedata1.*,
    rank() over (partition by (kpiregion,
    kpibrand)
order by
    coalesce(ctedata1.total_sales_amount, 0) desc) rank,
    count(*) over (partition by (kpiregion,
    kpibrand)) as total_count
from
    ctedata1 )
select
    table1.id_pf_item,
    table1.product_id,
    table1.color_code,
    table1.l1_code,
    table1.local_title as product_name,
    table1.id_region,
    table1.gender,
    case
        when table1.created_at is null then '1970/01/01 00:00:00'
        else table1.created_at
    end as created_at,
    (
    select
        count(distinct id_outfit)
    from
        table3
    left join table4 on
        table3.id_item = table4.id_item
        and table4.deleted_at is null
    where
        table3.deleted_at is null
        and table3.id_pf_item = table1.id_pf_item) as outfit_count,
    count(*) over() as total_matched,
    case
        when table1.v8_im_name = '' then table1.im_name
        else table1.v8_im_name
    end as im_name,
    case
        when table1.id_region != 1 then null
        else
        case
            when table1.sales_start_at is null then '1970/01/01 00:00:00'
            else table1.sales_start_at
        end
    end as sales_start_date,
    table1.category_ids,
    array_to_string(table1.intermediate_category_ids, ','),
    table1.image_url,
    table1.brand,
    table1.pdp_url,
    coalesce(ctedata2.total_visit_count, 0) as total_visit_count,
    coalesce(ctedata2.sh_visit_count, 0) as sh_visit_count,
    coalesce(ctedata2.ec_visit_count, 0) as ec_visit_count,
    coalesce(ctedata2.total_like_count, 0) as total_like_count,
    coalesce(ctedata2.sh_like_count, 0) as sh_like_count,
    coalesce(ctedata2.ec_like_count, 0) as ec_like_count,
    coalesce(ctedata2.total_order_count, 0) as total_order_count,
    coalesce(ctedata2.sh_order_count, 0) as sh_order_count,
    coalesce(ctedata2.ec_order_count, 0) as ec_order_count,
    coalesce(ctedata2.total_sales_amount, 0) as total_sales_amount,
    coalesce(ctedata2.sh_sales_amount, 0) as sh_sales_amount,
    coalesce(ctedata2.ec_sales_amount, 0) as ec_sales_amount,
    coalesce(ctedata2.ec_order_online_count, 0) as ec_order_online_count,
    coalesce(ctedata2.ec_sales_online_amount, 0) as ec_sales_online_amount,
    coalesce(ctedata2.ec_order_in_store_count, 0) as ec_order_in_store_count,
    coalesce(ctedata2.ec_sales_in_store_amount, 0) as ec_sales_in_store_amount,
    ctedata2.rank,
    ctedata2.total_count,
    table1.department,
    table1.seasons
from
    table1
left join ctedata2 on
    table1.im_name = ctedata2.im_name
    and table1.brand = ctedata2.kpibrand
where
    table1.deleted_at is null
    and table1.id_region = any('1')
    and lower(table1.brand) = any('"brand1","brand2"')
    and 'season1' = any(lower(seasons::text)::text[])
    and table1.department = 'Department1'
order by
    total_sales_amount desc offset 0
limit 100

上述查询的解释输出是

QUERY PLAN                                                                                                                                                                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=172326.55..173435.38 rows=1 width=952) (actual time=85664.201..85665.970 rows=100 loops=1)                                                                                                                                        
  CTE ctedata1                                                                                                                                                                                                                                  
    ->  GroupAggregate  (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.063..708.069 rows=73121 loops=1)                                                                                                                             
          Group Key: table2.im_name, table2.id_region, table2.brand                                                                                                                                                    
          ->  Index Scan using udx_table2_im_name_id_region_brand_target_date_key on table2  (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.029..308.582 rows=391779 loops=1)                                      
                Filter: ((deleted_at IS NULL) AND (id_region = ANY ('1'::integer[])))                                                                                                                                                        
                Rows Removed by Filter: 20415                                                                                                                                                                                                  
  CTE ctedata2                                                                                                                                                                                                                             
    ->  WindowAgg  (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1012.994..1082.057 rows=73121 loops=1)                                                                                                                          
          ->  WindowAgg  (cost=16104.06..17082.09 rows=43468 width=620) (actual time=945.755..1014.656 rows=73121 loops=1)                                                                                                                     
                ->  Sort  (cost=16104.06..16212.73 rows=43468 width=612) (actual time=945.747..963.254 rows=73121 loops=1)                                                                                                                     
                      Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC                                                                                                                 
                      Sort Method: external merge  Disk: 6536kB                                                                                                                                                                                
                      ->  CTE Scan on ctedata1  (cost=0.00..869.36 rows=43468 width=612) (actual time=0.069..824.841 rows=73121 loops=1)                                                                                                        
  ->  Result  (cost=74005.05..75113.88 rows=1 width=952) (actual time=85664.199..85665.950 rows=100 loops=1)                                                                                                                                   
        ->  Sort  (cost=74005.05..74005.05 rows=1 width=944) (actual time=85664.072..85664.089 rows=100 loops=1)                                                                                                                               
              Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC                                                                                                                                                         
              Sort Method: top-N heapsort  Memory: 76kB                                                                                                                                                                                        
              ->  WindowAgg  (cost=10960.95..74005.04 rows=1 width=944) (actual time=85658.049..85661.393 rows=3151 loops=1)                                                                                                                   
                    ->  Nested Loop Left Join  (cost=10960.95..74005.02 rows=1 width=927) (actual time=1075.219..85643.595 rows=3151 loops=1)                                                                                                  
                          Join Filter: (((table1.im_name)::text = ctedata2.im_name) AND ((table1.brand)::text = ctedata2.kpibrand))                                                                                                  
                          Rows Removed by Join Filter: 230402986                                                                                                                                                                               
                          ->  Bitmap Heap Scan on table1  (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)                                                                                          
                                Recheck Cond: (id_region = ANY ('1'::integer[]))                                                                                                                                                             
                                Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND (lower((brand)::text) = ANY ('brand1, brand2'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
                                Rows Removed by Filter: 106335                                                                                                                                                                                 
                                Heap Blocks: exact=42899                                                                                                                                                                                       
                                ->  Bitmap Index Scan on table1_im_name_id_region_key  (cost=0.00..10960.94 rows=110619 width=0) (actual time=38.307..38.307 rows=109486 loops=1)                                                             
                                      Index Cond: (id_region = ANY ('1'::integer[]))                                                                                                                                                         
                          ->  CTE Scan on ctedata2  (cost=0.00..869.36 rows=43468 width=592) (actual time=0.325..21.721 rows=73121 loops=3151)                                                                                             
        SubPlan 3                                                                                                                                                                                                                              
          ->  Aggregate  (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=100)                                                                                                                                    
                ->  Nested Loop Left Join  (cost=5.57..1108.57 rows=93 width=4) (actual time=0.007..0.016 rows=3 loops=100)                                                                                                                    
                      ->  Bitmap Heap Scan on table3  (cost=5.15..350.95 rows=93 width=4) (actual time=0.005..0.008 rows=3 loops=100)                                                                                                            
                            Recheck Cond: (id_pf_item = table1.id_pf_item)                                                                                                                                                                    
                            Filter: (deleted_at IS NULL)                                                                                                                                                                                       
                            Heap Blocks: exact=107                                                                                                                                                                                             
                            ->  Bitmap Index Scan on idx_id_pf_item  (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)                                                                                             
                                  Index Cond: (id_pf_item = table1.id_pf_item)                                                                                                                                                                
                      ->  Index Scan using index_table4_id_item on table4  (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)                                                                        
                            Index Cond: (table3.id_item = id_item)                                                                                                                                                                               
                            Filter: (deleted_at IS NULL)                                                                                                                                                                                       
                            Rows Removed by Filter: 0                                                                                                                                                                                          
Planning time: 1.023 ms                                                                                                                                                                                                                        
Execution time: 85669.512 ms

我变了

    and lower(table1.brand) = any('"brand1","brand2"')

在查询中

    and table1.brand = any('"Brand1","Brand2"')

计划改为

QUERY PLAN                                                                                                                                                                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=173137.44..188661.06 rows=14 width=952) (actual time=1444.123..1445.653 rows=100 loops=1)                                                                                                                                        
  CTE ctedata1                                                                                                                                                                                                                                 
    ->  GroupAggregate  (cost=0.42..80478.71 rows=43468 width=530) (actual time=0.040..769.982 rows=73121 loops=1)                                                                                                                            
          Group Key: table2.im_name, table2.id_region, table2.brand                                                                                                                                                   
          ->  Index Scan using udx_table2_item_im_name_id_region_brand_target_date_key on table2  (cost=0.42..59699.18 rows=391708 width=146) (actual time=0.021..350.774 rows=391779 loops=1)                                     
                Filter: ((deleted_at IS NULL) AND (id_region = ANY ('1'::integer[])))                                                                                                                                                       
                Rows Removed by Filter: 20415                                                                                                                                                                                                 
  CTE ctedata2                                                                                                                                                                                                                            
    ->  WindowAgg  (cost=16104.06..17842.78 rows=43468 width=628) (actual time=1088.905..1153.749 rows=73121 loops=1)                                                                                                                         
          ->  WindowAgg  (cost=16104.06..17082.09 rows=43468 width=620) (actual time=1020.017..1089.117 rows=73121 loops=1)                                                                                                                   
                ->  Sort  (cost=16104.06..16212.73 rows=43468 width=612) (actual time=1020.011..1037.170 rows=73121 loops=1)                                                                                                                  
                      Sort Key: ctedata1.kpiregion, ctedata1.kpibrand, (COALESCE(ctedata1.total_sales_amount, '0'::numeric)) DESC                                                                                                                
                      Sort Method: external merge  Disk: 6536kB                                                                                                                                                                               
                      ->  CTE Scan on ctedata1  (cost=0.00..869.36 rows=43468 width=612) (actual time=0.044..891.653 rows=73121 loops=1)                                                                                                       
  ->  Result  (cost=74815.94..90339.56 rows=14 width=952) (actual time=1444.121..1445.635 rows=100 loops=1)                                                                                                                                   
        ->  Sort  (cost=74815.94..74815.98 rows=14 width=944) (actual time=1444.053..1444.065 rows=100 loops=1)                                                                                                                               
              Sort Key: (COALESCE(ctedata2.total_sales_amount, '0'::numeric)) DESC                                                                                                                                                        
              Sort Method: top-N heapsort  Memory: 76kB                                                                                                                                                                                       
              ->  WindowAgg  (cost=72207.31..74815.68 rows=14 width=944) (actual time=1439.128..1441.885 rows=3151 loops=1)                                                                                                                   
                    ->  Hash Right Join  (cost=72207.31..74815.40 rows=14 width=927) (actual time=1307.531..1437.246 rows=3151 loops=1)                                                                                                       
                          Hash Cond: ((ctedata2.im_name = (table1.im_name)::text) AND (ctedata2.kpibrand = (table1.brand)::text))                                                                                                   
                          ->  CTE Scan on ctedata2  (cost=0.00..869.36 rows=43468 width=592) (actual time=1088.911..1209.646 rows=73121 loops=1)                                                                                          
                          ->  Hash  (cost=72207.10..72207.10 rows=14 width=399) (actual time=216.850..216.850 rows=3151 loops=1)                                                                                                              
                                Buckets: 4096 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1249kB                                                                                                                              
                                ->  Bitmap Heap Scan on table1  (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)                                                                                  
                                      Recheck Cond: (id_region = ANY ('1'::integer[]))                                                                                                                                                      
                                      Filter: ((deleted_at IS NULL) AND (department = 'Department1'::text) AND ((brand)::text = ANY ('Brand1, Brand2'::text[])) AND ('season1'::text = ANY ((lower((seasons)::text))::text[])))
                                      Rows Removed by Filter: 106335                                                                                                                                                                          
                                      Heap Blocks: exact=42899                                                                                                                                                                                
                                      ->  Bitmap Index Scan on table1_im_name_id_region_key  (cost=0.00..10960.94 rows=110619 width=0) (actual time=34.849..34.849 rows=109486 loops=1)                                                      
                                            Index Cond: (id_region = ANY ('1'::integer[]))                                                                                                                                                  
        SubPlan 3                                                                                                                                                                                                                             
          ->  Aggregate  (cost=1108.80..1108.81 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=100)                                                                                                                                   
                ->  Nested Loop Left Join  (cost=5.57..1108.57 rows=93 width=4) (actual time=0.006..0.014 rows=3 loops=100)                                                                                                                   
                      ->  Bitmap Heap Scan on table3  (cost=5.15..350.95 rows=93 width=4) (actual time=0.004..0.006 rows=3 loops=100)                                                                                                           
                            Recheck Cond: (id_pf_item = table1.id_pf_item)                                                                                                                                                                   
                            Filter: (deleted_at IS NULL)                                                                                                                                                                                      
                            Heap Blocks: exact=107                                                                                                                                                                                            
                            ->  Bitmap Index Scan on idx_id_pf_item  (cost=0.00..5.12 rows=93 width=0) (actual time=0.003..0.003 rows=3 loops=100)                                                                                            
                                  Index Cond: (id_pf_item = table1.id_pf_item)                                                                                                                                                               
                      ->  Index Scan using index_table4_id_item on table4  (cost=0.42..8.14 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=303)                                                                       
                            Index Cond: (table3.id_item = id_item)                                                                                                                                                                              
                            Filter: (deleted_at IS NULL)                                                                                                                                                                                      
                            Rows Removed by Filter: 0                                                                                                                                                                                         
Planning time: 0.760 ms                                                                                                                                                                                                                       
Execution time: 1448.848 ms

我的观察

table1 left join ctedata2 的 join 策略在避免 lower() 函数后发生变化。策略从嵌套循环左连接变为散列右连接。

ctedata2 上的 CTE Scan 节点仅在性能更好的查询中执行一次。

Postgres 版本 9.6

请帮助我理解这种行为。如果需要，我会提供更多信息。

【问题讨论】：

【参考方案1】：

几乎不值得深入研究一个几乎过时的版本的内部工作原理。这些时间和精力可能最好花在升级上。

但问题很简单。估计您对 table1 的扫描非常糟糕，尽管在更好的计划中减少了 14 倍。

->  Bitmap Heap Scan on table1  (cost=10960.95..72483.64 rows=1 width=399) (actual time=45.466..278.376 rows=3151 loops=1)

->  Bitmap Heap Scan on table1  (cost=10960.95..72207.10 rows=14 width=399) (actual time=46.434..214.246 rows=3151 loops=1)

您对 lower() 的使用，显然是没有理由的，肯定会导致估计不佳。并且将字符串动态转换为数组当然也无济于事。如果它首先存储为一个真实数组，那么统计系统可以得到它并生成更合理的估计。

【讨论】：

以上是关于Postgresql 查询的过滤条件中的列上的字符串操作如何影响它选择的计划的主要内容，如果未能解决你的问题，请参考以下文章