Teradata SQL:最大(最大)、第二和第三大列名

Posted

技术标签:

【中文标题】Teradata SQL:最大(最大)、第二和第三大列名【英文标题】:Teradata SQL: Max (greatest), 2nd and 3rd greatest column names 【发布时间】:2015-08-05 00:36:49 【问题描述】:

我在 Teradata 中有一个包含 6 列的表,如下所示:

ID   Feature1   Feature2   Feature3  Feature4  Feature5
1     12          15          1         22       350
2     121         0.9         999      756       879
...

我需要获取每行最大、第二大和第三大值的列名,因此,我需要如下所示的输出:

ID   Greatest    2nd_Greatest   3rd_Greatest
1    Feature5     Feature4         Feature2
2    Feature3     Feature5         Feature4

有人可以帮忙吗。

谢谢!

【问题讨论】:

1.见规范化。 【参考方案1】:

您可以使用大量的case 语句来做到这一点,如果任何值是NULL,它就会变得更加复杂。不过,那将是最快的方法。

最简单的方法可能是对数据进行反透视并重新汇总:

select id,
       max(case when seqnum = 1 then feature end) as greatest_feature,
       max(case when seqnum = 2 then feature end) as greatest_feature2,
       max(case when seqnum = 3 then feature end) as greatest_feature3,
       max(case when seqnum = 1 then which end) as which_1,
       max(case when seqnum = 2 then which end) as which_2,
       max(case when seqnum = 3 then which end) as which_3
from (select id, feature, row_number() over (partition by id order by feature desc) as serqnum
      from ((select id, feature1 as feature, 'feature1' as which from table) union all
            (select id, feature2 as feature, 'feature2' as which from table) union all
            (select id, feature3 as feature, 'feature3' as which from table) union all
            (select id, feature4 as feature, 'feature4' as which from table) union all
            (select id, feature5 as feature, 'feature5' as which from table) union all
            (select id, feature6 as feature, 'feature6' as which from table)
           ) t
      ) t
group by id;

【讨论】:

哇 - 这是一些严肃的 SQL!它让我走了一半——我能够得到,但 ID,最大、第二大和第三大特征的价值。但是,我需要问题中第二个表中的列名。 @julian_b 。 .如果愿意,您还可以使用列名代替值。 @julian_b 这个查询如此可怕的原因是因为你的模式是非规范化的。解决这个问题,你的问题就会神奇地消失。 @Strawberry,是的,你是对的。然而,这是一个建模数据集——我直接将它输入到 R 模型中——这就是表结构非常糟糕的原因。此时,它是一个巨大的电子表格。 但您当然可以像表格一样轻松地从视图中输入它。所以你有一个规范化的表,你可以从中构建一个非规范化的视图 - 或类似的东西【参考方案2】:

细化 Gordon 的查询:

您可以创建一个特征列表然后交叉连接它,而不是多次遍历这些 UNION 的源表:

SELECT t.id, f.feature, 
   CASE f.feature
      WHEN 'feature1' THEN t.feature1
      WHEN 'feature2' THEN t.feature2    
      WHEN 'feature3' THEN t.feature3
      WHEN 'feature4' THEN t.feature4
      WHEN 'feature5' THEN t.feature5
   END AS val
FROM tab AS t CROSS JOIN 
 (
   SELECT * FROM (SELECT 'feature1' AS feature) AS dt 
   UNION ALL
   SELECT * FROM (SELECT 'feature2' AS feature) AS dt 
   UNION ALL
   SELECT * FROM (SELECT 'feature3' AS feature) AS dt 
   UNION ALL
   SELECT * FROM (SELECT 'feature4' AS feature) AS dt 
   UNION ALL
   SELECT * FROM (SELECT 'feature5' AS feature) AS dt 
  ) AS f

您可以像上面一样使用 UNION 或作为真实表动态创建列表。

从 TD14.10 开始,还有一个 TD_UNPIVOT 表运算符(但仍然没有 PIVOT):

SELECT *
FROM TD_UNPIVOT
 (
   ON (SELECT id, feature1, feature2, feature3, feature4, feature5 FROM tab)
   USING
      VALUE_COLUMNS('val')
      UNPIVOT_COLUMN('feature')
      COLUMN_LIST('feature1', 'feature2', 'feature3', 'feature4', 'feature5')
 ) AS dt

同样从 TD14.10 开始,还有 LAST_VALUE 可用于与 ROW_NUMBER 一起查找第 n 个最大值,从而避免最终聚合:

SELECT id, 
   feature AS "Greatest",
   LAST_VALUE(feature)
   OVER (PARTITION BY id ORDER BY val DESC
         ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS "2nd_Greatest",
   LAST_VALUE(feature)
   OVER (PARTITION BY id ORDER BY val DESC
         ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING) AS "3rd_Greatest"
FROM TD_UNPIVOT
 (
   ON (SELECT id, feature1, feature2, feature3, feature4, feature5 FROM tab)
   USING
      VALUE_COLUMNS('val')
      UNPIVOT_COLUMN('feature')
      COLUMN_LIST('feature1', 'feature2', 'feature3', 'feature4', 'feature5')
 ) AS dt
QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY val DESC) = 1;

【讨论】:

谢谢你。我选择了 Gordon 对我当前版本的回答——在我的所有代码都投入生产之后看到了这一点。谢谢 - 我稍后会根据您的建议进行重构。

以上是关于Teradata SQL:最大(最大)、第二和第三大列名的主要内容,如果未能解决你的问题,请参考以下文章

如何利用sql 读取辅表的最大max 和第二最大max。。。。

ramda js 在第二和第三级更新深度嵌套的数组

离职第二和第三个月总结 2018.10

图像质量评价指标之——PSNR和SSIM

AVX2 64位无符号整数比较

css 这是flex-grow,flex-shrink和flex-basis组合的简写。第二和第三个参数(flex-shrink和flex-basis)是o