PostgreSQL 调用所有数据进行分组限制操作

Posted 2023-04-15

技术标签:

【中文标题】PostgreSQL 调用所有数据进行分组限制操作【英文标题】：PostgreSQL Calls All Data For Group By Limit Operation 【发布时间】：2019-10-11 15:26:05 【问题描述】：

我有一个如下查询：

SELECT 
    MAX(m.org_id) as orgId,
    MAX(m.org_name) as orgName,
    MAX(m.app_id) as appId,
    MAX(r.country_or_region) as country, 
    MAX(r.local_spend_currency) as currency, 
    SUM(r.local_spend_amount) as spend,
    SUM(r.impressions) as impressions
    ...
FROM report r  
LEFT JOIN metadata m 
    ON m.org_id = r.org_id
    AND m.campaign_id = r.campaign_id
    AND m.ad_group_id = r.ad_group_id 
WHERE (r.report_date BETWEEN '2019-01-01' AND '2019-10-10') 
    AND r.org_id = 1
GROUP BY r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text  
OFFSET 0
LIMIT 20

解释分析：

"Limit  (cost=1308.04..1308.14 rows=1 width=562) (actual time=267486.538..267487.067 rows=20 loops=1)"
"  ->  GroupAggregate  (cost=1308.04..1308.14 rows=1 width=562) (actual time=267486.537..267487.061 rows=20 loops=1)"
"        Group Key: r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text"
"        ->  Sort  (cost=1308.04..1308.05 rows=1 width=221) (actual time=267486.429..267486.536 rows=567 loops=1)"
"              Sort Key: r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text"
"              Sort Method: external merge  Disk: 667552kB"
"              ->  Nested Loop  (cost=1.13..1308.03 rows=1 width=221) (actual time=0.029..235158.692 rows=2742789 loops=1)"
"                    ->  Nested Loop Semi Join  (cost=0.44..89.76 rows=1 width=127) (actual time=0.016..8.967 rows=1506 loops=1)"
"                          Join Filter: (m.org_id = (479360))"
"                          ->  Nested Loop  (cost=0.44..89.05 rows=46 width=123) (actual time=0.013..4.491 rows=1506 loops=1)"
"                                ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)"
"                                      Group Key: 479360"
"                                      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)"
"                                ->  Index Scan using pmx_org_cmp_adg on metadata m  (cost=0.41..88.55 rows=46 width=119) (actual time=0.008..1.947 rows=1506 loops=1)"
"                                      Index Cond: (org_id = (479360))"
"                          ->  Materialize  (cost=0.00..0.03 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1506)"
"                                ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)"
"                    ->  Index Scan using report_unx on search_term_report r  (cost=0.69..1218.26 rows=1 width=118) (actual time=51.983..155.421 rows=1821 loops=1506)"
"                          Index Cond: ((org_id = m.org_id) AND (report_date >= '2019-07-01'::date) AND (report_date <= '2019-10-10'::date) AND (campaign_id = m.campaign_id) AND (ad_group_id = m.ad_group_id))"
"Planning Time: 0.988 ms"
"Execution Time: 267937.889 ms"

我有关于元数据和报告表的索引，例如： metadata(org_id, campaign_id, ad_group_id);报告（org_id、report_date、campaign_id、ad_group_id）

我只想随机调用 20 个有限制的项目。但是PostgreSQL需要这么长时间才能调用它？我该如何改进它？

【问题讨论】：

【参考方案1】：

您想要有 20 个组。但是为了构建这些组（可以肯定，任何组中都没有丢失任何内容），您需要获取所有原始数据。

【讨论】：

【参考方案2】：

当您说“随机项目”时，我假设您的意思是“随机报告”，因为您没有项目表。

with r as (select * from report WHERE r.report_date BETWEEN '2019-01-01' AND '2019-10-10' AND r.org_id = 1 order by random() limit 20)
select <whatever> from r left join <whatever>

您可能需要调整聚合结果。 “元数据”中的每条记录是否只属于“报告”中的一条记录？

【讨论】：

以上是关于PostgreSQL 调用所有数据进行分组限制操作的主要内容，如果未能解决你的问题，请参考以下文章