从两个不同的表计算增长百分比时除以零误差

Posted

技术标签:

【中文标题】从两个不同的表计算增长百分比时除以零误差【英文标题】:division by zero error while calculating growth percentage from two different tables 【发布时间】:2020-09-12 22:10:20 【问题描述】:

我目前拥有的表是这样的:(数据来自两个不同的表,19921231、19930331)

我要创建的表如下所示(添加了第 5 列)

目标:确定每家银行的存款增长率。 IE。比较上一季度在银行持有的存款(例如 19921231)与最近一个季度的存款(例如 19930331)。然后以百分比的形式查看增加/减少。

这是我目前写的代码:

select 
AL.repdte as `Date`, AL.cert, AL.name, AL.dep as `Deposits`
FROM usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as AL

UNION ALL

select 
AL.repdte as `Date`, AL.cert, AL.name, AL.dep as `Deposits`
FROM usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as AL

An answer to this question 建议使用此代码,但由于某种原因,我得到了“NULL”的输出

select al19930331.repdte as `Date`, al19930331.cert, al19930331.name,
       al19930331.dep as Deposits_1993,
       al19921231.dep as Deposits_1992,
       (al19930331.dep - al19921231.dep) / al19921231.dep as grow_rate
from usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as al19930331 left join
     usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as al19921231
     on al19930331.cert = al19921231.cert and
        al19930331.name = al19921231.name and
        al19921231.repdte = date_add(al19930331.repdte, interval 1 year);

为了隔离“NULL”问题,我能够消除“NULL”问题。我通过简单地查询来做到这一点。它很有效,我能够查看两个季度的“存款”。

select al19930331.repdte as `Date`, al19930331.cert, al19930331.name,
       al19930331.dep as Deposits_1993,
       al19921231.dep as Deposits_1992,
       (al19930331.dep - al19921231.dep) / al19921231.dep as grow_rate
from usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as al19930331 left join
     usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as al19921231
     on al19930331.cert = al19921231.cert and
        al19930331.name = al19921231.name

一旦我确定检索过去两个季度的存款数据似乎没有任何问题,我决定删除这段代码的最后一行:

select al19930331.repdte as `Date`, al19930331.cert, al19930331.name,
       al19930331.dep as Deposits_1993,
       al19921231.dep as Deposits_1992,
       (al19930331.dep - al19921231.dep) / al19921231.dep as grow_rate
from usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as al19930331 left join
     usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as al19921231
     on al19930331.cert = al19921231.cert and
        al19930331.name = al19921231.name and
        al19921231.repdte = date_add(al19930331.repdte, interval 1 year);

到这里:

select al19930331.repdte as `Date`, al19930331.cert, al19930331.name,
       al19930331.dep as Deposits_1993,
       al19921231.dep as Deposits_1992,
       (al19930331.dep - al19921231.dep) / al19921231.dep as grow_rate
from usa_fdic_call_reports_1993.All_Reports_19930331_Assets_and_Liabilities as al19930331 left join
     usa_fdic_call_reports_1992.All_Reports_19921231_Assets_and_Liabilities as al19921231
     on al19930331.cert = al19921231.cert and
        al19930331.name = al19921231.name

这有点工作,但是它暴露了一个“被零除”的错误。所以现在的问题是如何添加一个解决方案来消除错误除法,这将使我能够添加最后一行代码?

【问题讨论】:

【参考方案1】:

要修复除零错误,您可以进行安全除法。 BigQuery 有一个功能:

   IEEE_DIVIDE((al19930331.dep - al19921231.dep), al19921231.dep) as grow_rate

不过,我倾向于使用NULLIF(),这是一个标准的SQL函数:

   (al19930331.dep - al19921231.dep) / NULLIF(al19921231.dep, 0) as grow_rate

【讨论】:

成功了。谢谢戈登。仅供参考:使用最后一行时我无法获得有效查询:al19921231.repdte = date_add(al19930331.repdte, interval 1 year);它在三列中的两列上返回 Null。

以上是关于从两个不同的表计算增长百分比时除以零误差的主要内容,如果未能解决你的问题,请参考以下文章

对多个熊猫列进行排序并计算大于零的值百分比

计算两个时期之间的增长率

从多列值计算百分比

Pyspark - 使用 groupby 计算中值绝对百分比误差

为啥 Python 返回 0.0 进行除法? [复制]

带有计算字段的报告和百分比