PostgreSQL 调用所有数据进行分组限制操作
Posted
技术标签:
【中文标题】PostgreSQL 调用所有数据进行分组限制操作【英文标题】:PostgreSQL Calls All Data For Group By Limit Operation 【发布时间】:2019-10-11 15:26:05 【问题描述】:我有一个如下查询:
SELECT
MAX(m.org_id) as orgId,
MAX(m.org_name) as orgName,
MAX(m.app_id) as appId,
MAX(r.country_or_region) as country,
MAX(r.local_spend_currency) as currency,
SUM(r.local_spend_amount) as spend,
SUM(r.impressions) as impressions
...
FROM report r
LEFT JOIN metadata m
ON m.org_id = r.org_id
AND m.campaign_id = r.campaign_id
AND m.ad_group_id = r.ad_group_id
WHERE (r.report_date BETWEEN '2019-01-01' AND '2019-10-10')
AND r.org_id = 1
GROUP BY r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text
OFFSET 0
LIMIT 20
解释分析:
"Limit (cost=1308.04..1308.14 rows=1 width=562) (actual time=267486.538..267487.067 rows=20 loops=1)"
" -> GroupAggregate (cost=1308.04..1308.14 rows=1 width=562) (actual time=267486.537..267487.061 rows=20 loops=1)"
" Group Key: r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text"
" -> Sort (cost=1308.04..1308.05 rows=1 width=221) (actual time=267486.429..267486.536 rows=567 loops=1)"
" Sort Key: r.country_or_region, r.ad_group_id, r.keyword_id, r.keyword, r.text"
" Sort Method: external merge Disk: 667552kB"
" -> Nested Loop (cost=1.13..1308.03 rows=1 width=221) (actual time=0.029..235158.692 rows=2742789 loops=1)"
" -> Nested Loop Semi Join (cost=0.44..89.76 rows=1 width=127) (actual time=0.016..8.967 rows=1506 loops=1)"
" Join Filter: (m.org_id = (479360))"
" -> Nested Loop (cost=0.44..89.05 rows=46 width=123) (actual time=0.013..4.491 rows=1506 loops=1)"
" -> HashAggregate (cost=0.02..0.03 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)"
" Group Key: 479360"
" -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)"
" -> Index Scan using pmx_org_cmp_adg on metadata m (cost=0.41..88.55 rows=46 width=119) (actual time=0.008..1.947 rows=1506 loops=1)"
" Index Cond: (org_id = (479360))"
" -> Materialize (cost=0.00..0.03 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1506)"
" -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)"
" -> Index Scan using report_unx on search_term_report r (cost=0.69..1218.26 rows=1 width=118) (actual time=51.983..155.421 rows=1821 loops=1506)"
" Index Cond: ((org_id = m.org_id) AND (report_date >= '2019-07-01'::date) AND (report_date <= '2019-10-10'::date) AND (campaign_id = m.campaign_id) AND (ad_group_id = m.ad_group_id))"
"Planning Time: 0.988 ms"
"Execution Time: 267937.889 ms"
我有关于元数据和报告表的索引,例如: metadata(org_id, campaign_id, ad_group_id);报告(org_id、report_date、campaign_id、ad_group_id)
-
我只想随机调用 20 个有限制的项目。但是PostgreSQL需要这么长时间才能调用它?我该如何改进它?
【问题讨论】:
【参考方案1】:您想要有 20 个组。但是为了构建这些组(可以肯定,任何组中都没有丢失任何内容),您需要获取所有原始数据。
【讨论】:
【参考方案2】:当您说“随机项目”时,我假设您的意思是“随机报告”,因为您没有项目表。
with r as (select * from report WHERE r.report_date BETWEEN '2019-01-01' AND '2019-10-10' AND r.org_id = 1 order by random() limit 20)
select <whatever> from r left join <whatever>
您可能需要调整聚合结果。 “元数据”中的每条记录是否只属于“报告”中的一条记录?
【讨论】:
以上是关于PostgreSQL 调用所有数据进行分组限制操作的主要内容,如果未能解决你的问题,请参考以下文章
PostgreSQL如何对结果进行分组以使所有行都必须为真?
PostgreSQL 中的分组限制:显示每个组的前 N 行?