无法从 SELECT 返回多行以按特定列正确汇总

Posted 2023-03-24

技术标签:

【中文标题】无法从 SELECT 返回多行以按特定列正确汇总【英文标题】：Unable to get multiple rows returned from a SELECT to summarize correctly by a specific column 【发布时间】：2020-09-22 20:00:34 【问题描述】：

我有一个如下所示的 Oracle 表：

     test_time              test_name    test_type     test_location    test_value
     -----------------      ---------    ---------     -------------    ----------
     09/22/20 12:00:05         A            RT             Albany           200
     09/22/20 12:00:05         A            RT             Chicago          500
     09/22/20 12:00:05         B            RT             Albany           400
     09/22/20 12:00:05         B            RT             Chicago          300
     09/22/20 12:00:05         A            WPL            Albany           1500
     09/22/20 12:00:05         A            WPL            Chicago          2300
     09/22/20 12:00:05         B            WPL            Albany           2100
     09/22/20 12:00:05         B            WPL            Chicago          1900
     09/22/20 12:05:47         A            RT             Albany           300
     09/22/20 12:05:47         A            RT             Chicago          400
     09/22/20 12:05:47         B            RT             Albany           600
     09/22/20 12:05:47         B            RT             Chicago          500
     09/22/20 12:05:47         A            WPL            Albany           1700
     09/22/20 12:05:47         A            WPL            Chicago          2000
     09/22/20 12:05:47         B            WPL            Albany           1800
     09/22/20 12:05:47         B            WPL            Chicago          2400

我想对这个表运行一个 SELECT，它将显示过去 11 分钟内为特定 test_type（在本例中为“RT”）引用的每个位置的平均值，由 test_name 汇总。 “11 分钟”用于确保我将从脚本的至少两次迭代中检索行，该脚本每五分钟插入一次记录。

我希望针对此表的 SELECT 语句的结果如下所示：

     test_name      albany_avg_val     chicago_avg_val  
     ---------      --------------     ---------------  
      A                 250                450         
      B                 500                400

（注意：test_name "A" 的 "albany_avg_val" 反映了与在 12 处运行的 test_name "A"/test_type "RT"/test_location "Albany" 的两次迭代相关的 "test_value" 值的平均值： 00 和 12:05)。

到目前为止，我构建的 SELECT 语句如下所示：

SELECT
   test_name,
   CASE test_location
      WHEN 'Albany'
         THEN ROUND(AVG( test_value ),0) albany_avg_val
      WHEN 'Chicago'
         THEN ROUND(AVG( test_value ),0) chicago_avg_val
   END
FROM
   test_table
WHERE
   test_type = 'RT' AND test_time > sysdate - interval '11' minute;

...但它没有按预期工作。有人可以帮我解决我可能缺少的东西吗？

【问题讨论】：

可以有更多test_location 的值，您希望结果中有更多列吗？还是永远只有两个 - Albany 和 Chicago？ @MartinHeralecký：在不久的将来可能会添加其他值（城市）；最多可使用八个城市进行测试。 【参考方案1】：

我想你想要：

select
    test_name,
    round(avg(case when test_location = 'Albany'  then test_value end)) albany_avg_val
    round(avg(case when test_location = 'Chicago' then test_value end)) chicago_avg_val
from test_table
where
   test_type = 'rt' 
   and test_location in ('Albany', 'Chicago')
   and test_time > sysdate - 11 / 24 / 60
group by test_name

即：

使用group by！

移动case表达式在聚合函数avg()

每列应分开 - 条件表达式不能生成两列

还有……：

where 子句中的预过滤提高了查询的效率

对sysdate（date）使用“数字”日期算术更安全；如果你想要区间算术，请改用systimestamp

0 是round() 的默认精度

【讨论】：

这是解决问题的好方法。我们的团队选择使用不同的答案。谢谢！【参考方案2】：

看来你需要条件聚合：

SELECT
      test_name,
      AVG(CASE 
          WHEN test_location='Albany'
          THEN ROUND( test_value ) END) AS albany_avg_val,
      AVG(WHEN test_location='Chicago'
          THEN ROUND( test_value ) END) AS chicago_avg_val
 FROM test_table
WHERE test_type = 'RT' 
  AND test_time > sysdate - interval '11' minute;
GROUP BY test_name

ROUND() 函数的第二个参数（0）是多余的。

【讨论】：

【参考方案3】：

请尝试类似的方法

SELECT
   test_name,
   ROUND(AVG(CASE when test_location='Albany'
         THEN  test_value 
         else null end),0) albany_avg_val,
 ROUND(AVG(CASE when test_location='Chicago'
         THEN  test_value 
         else null end),0) Chicago_avg_val
 
FROM
   test_table
WHERE
   test_type = 'RT' AND test_time > sysdate - interval '11' minute
   group by test_name; ```

【讨论】：

【参考方案4】：

pivot 子句正是为此类事情设计的：以下查询聚合了所有 test_type 值：

select *
from (select test_name, test_location, test_type, test_value from test_table)
pivot(
  avg(test_value)
  for test_location in ('Albany ' as Albany,'Chicago' as Chicago)
);

结果：

TEST_NAME TEST_TYPE     ALBANY    CHICAGO
--------- --------- ---------- ----------
A         RT               250        450
B         RT               500        400
A         WPL             1600       2150
B         WPL             1950       2150

或者如果你只想过滤RT:

select *
from (select test_name, test_location, test_value from test_table where test_type='RT')
pivot(
  avg(test_value)
  for test_location in ('Albany ' as Albany,'Chicago' as Chicago)
);

结果：

TEST_NAME     ALBANY    CHICAGO
--------- ---------- ----------
B                500        400
A                250        450

带有样本数据的完整测试用例：

with test_table(test_time,test_name,test_type,test_location,test_value) as (
select to_date('09/22/20 12:00:05','mm/dd/yy hh24:mi:ss'), 'A', 'RT ', 'Albany ', 200  from dual union all
select to_date('09/22/20 12:00:05','mm/dd/yy hh24:mi:ss'), 'A', 'RT ', 'Chicago', 500  from dual union all
select to_date('09/22/20 12:00:05','mm/dd/yy hh24:mi:ss'), 'B', 'RT ', 'Albany ', 400  from dual union all
select to_date('09/22/20 12:00:05','mm/dd/yy hh24:mi:ss'), 'B', 'RT ', 'Chicago', 300  from dual union all
select to_date('09/22/20 12:00:05','mm/dd/yy hh24:mi:ss'), 'A', 'WPL', 'Albany ', 1500 from dual union all
select to_date('09/22/20 12:00:05','mm/dd/yy hh24:mi:ss'), 'A', 'WPL', 'Chicago', 2300 from dual union all
select to_date('09/22/20 12:00:05','mm/dd/yy hh24:mi:ss'), 'B', 'WPL', 'Albany ', 2100 from dual union all
select to_date('09/22/20 12:00:05','mm/dd/yy hh24:mi:ss'), 'B', 'WPL', 'Chicago', 1900 from dual union all
select to_date('09/22/20 12:05:47','mm/dd/yy hh24:mi:ss'), 'A', 'RT ', 'Albany ', 300  from dual union all
select to_date('09/22/20 12:05:47','mm/dd/yy hh24:mi:ss'), 'A', 'RT ', 'Chicago', 400  from dual union all
select to_date('09/22/20 12:05:47','mm/dd/yy hh24:mi:ss'), 'B', 'RT ', 'Albany ', 600  from dual union all
select to_date('09/22/20 12:05:47','mm/dd/yy hh24:mi:ss'), 'B', 'RT ', 'Chicago', 500  from dual union all
select to_date('09/22/20 12:05:47','mm/dd/yy hh24:mi:ss'), 'A', 'WPL', 'Albany ', 1700 from dual union all
select to_date('09/22/20 12:05:47','mm/dd/yy hh24:mi:ss'), 'A', 'WPL', 'Chicago', 2000 from dual union all
select to_date('09/22/20 12:05:47','mm/dd/yy hh24:mi:ss'), 'B', 'WPL', 'Albany ', 1800 from dual union all
select to_date('09/22/20 12:05:47','mm/dd/yy hh24:mi:ss'), 'B', 'WPL', 'Chicago', 2400 from dual 
)
select *
from (select test_name, test_location, test_type, test_value from test_table)
pivot(
  avg(test_value)
  for test_location in ('Albany ' as Albany, 'Chicago' as Chicago)
);

【讨论】：

与我的团队一起审查和测试解决方案后，“枢轴”方法浮出水面。我们以前没有在 Oracle 查询中使用过“pivot”，我们认为这是一个很好的构建用例。我们只需要添加“test_time”条件（最后 11 分钟）。谢谢！

以上是关于无法从 SELECT 返回多行以按特定列正确汇总的主要内容，如果未能解决你的问题，请参考以下文章