获取按值分组的百分比值
Posted
技术标签:
【中文标题】获取按值分组的百分比值【英文标题】:getting percentage value for grouped by values 【发布时间】:2021-06-21 16:18:08 【问题描述】:我有一张桌子,我正在使用该组来获取每个国家/地区的分组总和
Select country,count(country) from `xyz.xyz.call_stats` where fan_out = 1 and last_channel_uri not like 'sip:confctl-d%' and call_end_time between '2021-05-17 00:00:00' and '2021-05-18 00:00:00' group by country
它以以下格式为我提供值:-
Row country f0_
1 DZ 4941
2 NO 30737
3 IS 436
4 IT 32086
5 GF 11
6 CZ 9156
我需要的是每个国家/地区的百分比值,例如国家 DZ,4941 是该时间段的百分比值,并且所有时间段都相同。我的预期结果是:-
Row country f0_ f1_
1 DZ 4941 10%
2 NO 30737 25.7%
3 IS 436 2%
4 IT 32086 29.6%
5 GF 11 0.04%
6 CZ 9156 3.67%
P.S:- 我想在计算总百分比时忽略条件 fan_out = 1 但在进行分组时想保留它。
【问题讨论】:
【参考方案1】:您可以使用窗口函数。我更喜欢 0 和 1 之间的比率:
Select country, count(*),
count(*) * 1.0 / sum(count(*)) over () as ratio
from `xyz.xyz.call_stats`
where fan_out = 1 and
last_channel_uri not like 'sip:confctl-d%' and
call_end_time between '2021-05-17 00:00:00' and '2021-05-18 00:00:00'
group by country;
如果你想要一个介于 0 和 100 之间的值,你可以乘以 100
。
编辑:
如果您想要基于特定条件的百分比,请使用countif()
:
Select country, countif(fan_out = 1),
count(*) * 1.0 / sum(count(*)) over () as ratio
from `xyz.xyz.call_stats`
where last_channel_uri not like 'sip:confctl-d%' and
call_end_time between '2021-05-17 00:00:00' and '2021-05-18 00:00:00'
group by country;
【讨论】:
那么,如果我想在计算百分比总数时忽略 fan_out=1 条件,但在进行分组时我想保留它,我该怎么办。我知道这不是问题的一部分,但如果你能帮助我,那就太好了。如果你愿意,我也可以编辑问题。【参考方案2】:我能够使用以下解决方案解决查询:-
WITH
T1 AS (Select country,count(country) as NumA from `xyz.xyz.call_stats` where fan_out = 1 and last_channel_uri not like 'sip:confctl-d%' and call_end_time between '2021-05-17 00:00:00' and '2021-05-18 00:00:00' group by country),
T2 AS (Select count(*) as NumB from `xyz.xyz.call_stats` where last_channel_uri not like 'sip:confctl-d%' and call_end_time between '2021-05-17 00:00:00' and '2021-05-18 00:00:00')
SELECT country,CAST(T1.NumA AS FLOAT64) / CAST(T2.NumB AS FLOAT64) * 100 as PctMetTgt
FROM T1, T2 order by PctMetTgt DESC
【讨论】:
以上是关于获取按值分组的百分比值的主要内容,如果未能解决你的问题,请参考以下文章
Apache PIG - 使用百分比值对 foreach 中的分组数据进行采样
Python - 如果字符串存在于文本中,则获取概率/百分比值