SQL 根据特定列和日期汇总小计
Posted
技术标签:
【中文标题】SQL 根据特定列和日期汇总小计【英文标题】:SQL sumarize subtotal depending on specific column and date 【发布时间】:2021-08-25 14:39:12 【问题描述】:拜托我需要一位精通SQL的高手来解决下。我有一个简单的(从真实的)表,从列名到值,我需要按如下方式计算 keyval 列:
name | period | type | value | keyval | formula | RULE |
---|---|---|---|---|---|---|
n1 | 202105 | ppto | 123 | 1087 | =123+876+88 | If type='ppto' and if period between march to december then Sum value from current row to 2 preceding rows of type=Ppto and put in keyval column |
n1 | 202104 | ppto | 876 | 975 | =876+88+11 | If type='ppto' and if period between march to december then Sum value from current row to 2 preceding rows of type=Ppto and put in keyval column |
n1 | 202103 | ppto | 88 | 209 | =88+11+110 | If type='ppto' and if period between march to december then Sum value from current row to 2 preceding rows of type=Ppto and put in keyval column |
n1 | 202102 | ppto | 11 | 134 | =11+110+13 | If type='ppto' and if period = february then Sum value from current row to 1 preceding rows of type=Ppto plus value from december of the last year of type=real and put in keyval column |
n1 | 202101 | ppto | 110 | 166 | =110+13+28 | If type='ppto' and if periodo = january then Sum value from row type=Ppto plus values from december and november of the last year of type=real and put in keyval column |
n1 | 202012 | ppto | 82 | 238 | =82+55+101 | If type='ppto' and if period between march to december then Sum value from current row to 2 preceding rows of type=Ppto and put in keyval column |
n1 | 202011 | ppto | 55 | 258 | =55+101+102 | If type='ppto' and if period between march to december then Sum value from current row to 2 preceding rows of type=Ppto and put in keyval column |
n1 | 202010 | ppto | 101 | - | =101+102+null | null because there are not enough 3 values to sum (current to 2 preceding from type=ppto and period from month january to december) |
n1 | 202009 | ppto | 102 | - | =102+null+null | null because there are not enough 3 values to sum (current to 2 preceding from type=ppto and period from month january to december) |
n1 | 202012 | real | 13 | 135 | =13+28+94 | If type='real' then Sum values from current row to 2 preceding rows of type=real and put in keyval column |
n1 | 202011 | real | 28 | 160 | =28+94+38 | If type='real' then Sum values from current row to 2 preceding rows of type=real and put in keyval column |
n1 | 202010 | real | 94 | - | =94+38+null | null because there are not enough 3 values to sum (current to 2 preceding from type=real and from month january to december) |
n1 | 202009 | real | 38 | - | =38+null+null | null because there are not enough 3 values to sum (current to 2 preceding from type=real and from month january to december) |
这是我最适合解决方案的尝试,但我认为它的效率非常低,并且在需要时不会获得空值:
与 b 为 ( SELECT cast( substr(cast(period as string),1,4) as int64) as an ,p.* FROM mytable p) , ppto 为 ( select b.* from b where type='ppto') , 实为 ( select sum(value) over (order by period desc rows between current row and 2 following) as keyval,b.* from b where type='real') , both_sets 为 ( 选择 p,r12,r11 从 pp 到 p left join real r12 on p.name = r12.name and r12.ano = p.ano-1 and cast( substr(cast(r12.period as string),5) as int64) = 12 left join real r11 on p.name = r11.name and r11.ano = p.ano-1 and cast( substr(cast(r11.period as string),5) as int64) = 11) , 猫为 ( 选择 (当 p.type='ppto' 和 cast( substr(cast(p.period as string),5) as int64) >2 然后 sum(p.value) over (按 p.period desc 行排序) 在当前行和以下 2 行之间) 当 p.type='ppto' 和 cast( substr(cast(p.period as string),5) as int64) =2 然后 sum(p.value) over (order by p.period desc rows 在当前行和以下 1 个之间)+r12.value 当 p.type='ppto' 和 cast( substr(cast(p.period as string),5) as int64) =1 然后 p.value+r12.value+r11.value 否则 0 结束)keyval ,p.value ,p.period,p.name,p.type 来自both_sets u) 从猫中选择* 联合所有 从实数中选择键值、值、句点、名称、类型 按类型排序,期间降序结果是这样的:
name | period | type | value | keyval |
---|---|---|---|---|
n1 | 202105 | ppto | 123 | 1087 |
n1 | 202104 | ppto | 876 | 975 |
n1 | 202103 | ppto | 88 | 209 |
n1 | 202102 | ppto | 11 | 134 |
n1 | 202101 | ppto | 110 | 166 |
n1 | 202012 | ppto | 82 | 238 |
n1 | 202011 | ppto | 55 | 258 |
n1 | 202010 | ppto | 101 | 203 |
n1 | 202009 | ppto | 102 | 102 |
n1 | 202012 | real | 13 | 135 |
n1 | 202011 | real | 28 | 160 |
n1 | 202010 | real | 94 | 132 |
n1 | 202009 | real | 38 | 38 |
如你所见,我不需要这些值
请问,我怎样才能做到这一点? 非常感谢您的时间和帮助。
【问题讨论】:
您使用的是 Postgres 还是 BigQuery?请仅使用您真正使用的数据库进行标记。 提示:您正在执行几个不同的操作,每个操作都针对不同的数据子集 - 因此您可能想要编写几个单独的查询,每个查询都有不同的 WHERE 子句,但是他们都返回相同的字段集...然后您可以使用 UNION 连接结果集。 【参考方案1】:这是我最好的新尝试......
用 b 作为(SELECT cast(substr(cast(period as string),1,4) as int64) as ano,p.* FROM `tc-sc-bi-bigdata-fdp-dev.stg_cor_cor_fdp_fnrp_dev.tmp_prueba`p) , ppto 为 ( select b.* from b where type='ppto') , 实为 ( 选择 (当 NTH_VALUE(value,3) OVER (PARTITION BY type ORDER BY period DESC rows between current row and 2 following) 不为空时 sum(value) over (order by period desc rows between current row and 2 following) end) as keyval,b.* from b where type='real' ) , 工发组织为 ( 选择 p,r12,r11 从 pp 到 p left join real r12 on p.name = r12.name and r12.ano = p.ano-1 and cast( substr(cast(r12.period as string),5) as int64) = 12 left join real r11 on p.name = r11.name and r11.ano = p.ano-1 and cast( substr(cast(r11.period as string),5) as int64) = 11) , 猫为 ( 选择 (当 p.type='ppto' 和 cast( substr(cast(p.period as string),5) as int64) >2 并且 NTH_VALUE(p.value,3) OVER (按当前行和以下 2 行之间的 p.period desc 行排序)不为空 然后 sum(p.value) over (按当前行和以下 2 行之间的 p.period desc 行排序) 当 p.type='ppto' 和 cast( substr(cast(p.period as string),5) as int64) =2 并且 NTH_VALUE(p.value,2) OVER (按当前行和后面 1 行之间的 p.period desc 行排序)不为空 然后 sum(p.value) over (order by p.period desc rows between current row and 1 following)+r12.value 当 p.type='ppto' 和 cast( substr(cast(p.period as string),5) as int64) =1 然后 p.value+r12.value+r11.value else null end) keyval ,p.value ,p.period,p.name,p.type 来自工发组织 u) 从猫中选择 * 联合所有 从 real 中选择 keyval、value、period、name、type 按类型排序,期间降序【讨论】:
【参考方案2】:考虑下面
select * except(month),
case
when type = 'real' or (type = 'ppto' and month between 3 and 12) then
if(count(value) over recent3months < 3, null, sum(value) over recent3months)
when type = 'ppto' and month = 2 then
sum(value) over recent2months + sum(if(type = 'real', value, 0)) over recent3rdmonth
when type = 'ppto' and month = 1 then
value + sum(if(type = 'real', value, 0)) over recent2ndand3rdmonth
end as keyval
from `project.dataset.mytable`, unnest([period - 100 * div(period, 100)]) month
window
recent3months as (partition by name, type order by period desc range between current row and 2 following),
recent2months as (partition by name, type order by period desc range between current row and 1 following),
recent3rdmonth as (partition by name order by period desc range between 90 following and 90 following),
recent2ndand3rdmonth as (partition by name order by period desc range between 89 following and 90 following)
如果应用于您问题中的样本数据 - 输出是
【讨论】:
很高兴它对你有用。也请考虑投票给答案! :o) 谢谢@mikhail-berlyant 了不起!非凡的!再次感谢您!【参考方案3】:为 BigQuery 尝试 LEAD:
select
period,
type,
value,
value + ifnull(LEAD(value) OVER (PARTITION BY type ORDER BY period DESC), 0) + ifnull(LEAD(value, 2) OVER (PARTITION BY type ORDER BY period DESC), 0) as keyval
from mytable
order by type asc, period desc
【讨论】:
谢谢@sergey-geron以上是关于SQL 根据特定列和日期汇总小计的主要内容,如果未能解决你的问题,请参考以下文章