PostgreSQL 调用所有数据进行分组限制操作

Posted

技术标签:

【中文标题】PostgreSQL 调用所有数据进行分组限制操作【英文标题】:PostgreSQL Calls All Data For Group By Limit Operation 【发布时间】:2019-10-11 15:26:05 【问题描述】:

我有一个如下查询:

SELECT 
    MAX(m.org_id) as orgId,
    MAX(m.org_name) as orgName,
    MAX(m.app_id) as appId,
    MAX(r.country_or_region) as country, 
    MAX(r.local_spend_currency) as currency, 
    SUM(r.local_spend_amount) as spend,
    SUM(r.impressions) as impressions
    ...
FROM report r  
LEFT JOIN metadata m 
    ON m.org_id = r.org_id
    AND m.campaign_id = r.campaign_id
    AND m.ad_group_id = r.ad_group_id 
WHERE (r.report_date BETWEEN '2019-01-01' AND '2019-10-10') 
    AND r.org_id = 1
GROUP BY r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text  
OFFSET 0
LIMIT 20

解释分析:

"Limit  (cost=1308.04..1308.14 rows=1 width=562) (actual time=267486.538..267487.067 rows=20 loops=1)"
"  ->  GroupAggregate  (cost=1308.04..1308.14 rows=1 width=562) (actual time=267486.537..267487.061 rows=20 loops=1)"
"        Group Key: r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text"
"        ->  Sort  (cost=1308.04..1308.05 rows=1 width=221) (actual time=267486.429..267486.536 rows=567 loops=1)"
"              Sort Key: r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text"
"              Sort Method: external merge  Disk: 667552kB"
"              ->  Nested Loop  (cost=1.13..1308.03 rows=1 width=221) (actual time=0.029..235158.692 rows=2742789 loops=1)"
"                    ->  Nested Loop Semi Join  (cost=0.44..89.76 rows=1 width=127) (actual time=0.016..8.967 rows=1506 loops=1)"
"                          Join Filter: (m.org_id = (479360))"
"                          ->  Nested Loop  (cost=0.44..89.05 rows=46 width=123) (actual time=0.013..4.491 rows=1506 loops=1)"
"                                ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)"
"                                      Group Key: 479360"
"                                      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)"
"                                ->  Index Scan using pmx_org_cmp_adg on metadata m  (cost=0.41..88.55 rows=46 width=119) (actual time=0.008..1.947 rows=1506 loops=1)"
"                                      Index Cond: (org_id = (479360))"
"                          ->  Materialize  (cost=0.00..0.03 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1506)"
"                                ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)"
"                    ->  Index Scan using report_unx on search_term_report r  (cost=0.69..1218.26 rows=1 width=118) (actual time=51.983..155.421 rows=1821 loops=1506)"
"                          Index Cond: ((org_id = m.org_id) AND (report_date >= '2019-07-01'::date) AND (report_date <= '2019-10-10'::date) AND (campaign_id = m.campaign_id) AND (ad_group_id = m.ad_group_id))"
"Planning Time: 0.988 ms"
"Execution Time: 267937.889 ms"

我有关于元数据和报告表的索引,例如: metadata(org_id, campaign_id, ad_group_id);报告(org_id、report_date、campaign_id、ad_group_id)

    我只想随机调用 20 个有限制的项目。但是PostgreSQL需要这么长时间才能调用它?我该如何改进它?

【问题讨论】:

【参考方案1】:

您想要有 20 个组。但是为了构建这些组(可以肯定,任何组中都没有丢失任何内容),您需要获取所有原始数据。

【讨论】:

【参考方案2】:

当您说“随机项目”时,我假设您的意思是“随机报告”,因为您没有项目表。

with r as (select * from report WHERE r.report_date BETWEEN '2019-01-01' AND '2019-10-10' AND r.org_id = 1 order by random() limit 20)
select <whatever> from r left join <whatever>

您可能需要调整聚合结果。 “元数据”中的每条记录是否只属于“报告”中的一条记录?

【讨论】:

以上是关于PostgreSQL 调用所有数据进行分组限制操作的主要内容,如果未能解决你的问题,请参考以下文章

PostgreSQL如何对结果进行分组以使所有行都必须为真?

PostgreSQL 中的分组限制:显示每个组的前 N ​​行?

在 postgresql 中按月和年对查询结果进行分组

查询优化 PostgreSQL (GreenPlum)。根据排名前 5 位的结果进行分组

postgresql数据库体系结构

PostgreSQL 查询按天计数/分组并显示没有数据的天数