需要在 Impala 中创建一个 Un-Pivots、Pivo​​ts 和 Union 的视图

Posted

技术标签:

【中文标题】需要在 Impala 中创建一个 Un-Pivots、Pivo​​ts 和 Union 的视图【英文标题】:Need to create a view in Impala which Un-Pivots, Pivots and Union's it altogether 【发布时间】:2020-11-24 12:02:54 【问题描述】:

我第一次尝试在 SQL 中做一些适当的事情,除了复制一些我可以通过 Alteryx 轻松完成的事情时遇到问题。

本质上,需要一些基本的数据整理来创建我已经在 Impala/Hive 中创建的表的摘要。需要将基本表分解为较小的表(未透视表和已透视表),然后将它们联合在一起以创建聚合表。

表格如下:

Run_Code | ID | ColB | ColC | ColD | ColE | ColF | ColG | TaxExpense | RetainedExpense | IncomeExpense | Year
-----------------------------------------------------------------------------------------------------------------
run1     | 21 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789      |  1234.56789   | Year1 
run1     | 22 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789      |  1234.56789   | Year2
run1     | 23 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789      |  1234.56789   | Year3
run1     | 24 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234 | 1234.56789 | 1234.56789      |  1234.56789   | Year4

目前在 Alteryx 中完成了以下工作;即只选择了 TaxExpense,然后将其取消透视,然后作为一列返回。

Run_Code | ID | ColB | ColC | ColD |     Name    | Year1 | Year2 | Year3 | Year4
-----------------------------------------------------------------------------------------------------------------
run1     | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | Tax Expense | (sum) | (sum) | (sum) | (sum)

保留费用和收入费用也是如此。

Run_Code | ID | ColB | ColC | ColD |      Name      | Year1 | Year2 | Year3 | Year4
-----------------------------------------------------------------------------------------------------------------
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)

最终结果如下,期望的结果如下:

Run_Code | ID | ColB | ColC | ColD |      Name       | Year1 | Year2 | Year3 | Year4
-----------------------------------------------------------------------------------------------------------------
run1     | 21 | 1234 | 1234 | 1234 | TaxExpense      | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | TaxExpense      | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | TaxExpense      | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | TaxExpense      | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | RetainedExpense | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | IncomeExpense   | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | IncomeExpense   | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | IncomeExpense   | (sum) | (sum) | (sum) | (sum)
run1     | 21 | 1234 | 1234 | 1234 | IncomeExpense   | (sum) | (sum) | (sum) | (sum)

感谢任何帮助创建解决上述问题的 SQL。

【问题讨论】:

【参考方案1】:

嗯。 . .如果我理解正确,您可以取消透视并重新聚合:

select Run_Code, ID, ColB, ColC, ColD,
       sum(case when year = 'year1' then expense end) as year_1,
       sum(case when year = 'year2' then expense end) as year_2,
       sum(case when year = 'year3' then expense end) as year_3,
       sum(case when year = 'year4' then expense end) as year_4
from ((select Run_Code, ID, ColB, ColC, ColD, 'TaxExpense' as name, TaxExpense as expense, year
       from t
      ) union all
      (select Run_Code, ID, ColB, ColC, ColD, 'RetainedExpense' as name, RetainedExpense, year
       from t
      ) union all
      (select Run_Code, ID, ColB, ColC, ColD, 'IncomeExpense' as name, IncomeExpense, year
       from t
      )
     ) t
group by Run_Code, ID, ColB, ColC, ColD, name

【讨论】:

我收到以下错误:AnalysisException:SUM 需要一个数字参数:sum(CASE WHEN year = 'YEAR1' THEN Name END)。当我尝试使用 MAX 而不是 Sum 时,名称字段不会出现,而是填充在 Year_1、Year_2 等年份下。

以上是关于需要在 Impala 中创建一个 Un-Pivots、Pivo​​ts 和 Union 的视图的主要内容,如果未能解决你的问题,请参考以下文章

在 Impala DB 中创建表作为选择百分比子查询

在 Impala 中创建外部表 - 错误

在 Impala 中创建表或视图时不同的行数

无法使用 impala-shell 在 kudu 中创建表

在 Cloudera Impala(虚拟机)中创建数据库时出错

如何在hadoop环境中创建的表中插入多条记录