如何获得 STRING 值之间的 MAX() 和 SUM() 值?

Posted

技术标签:

【中文标题】如何获得 STRING 值之间的 MAX() 和 SUM() 值?【英文标题】:How do you get the MAX() and SUM() of values between STRING values? 【发布时间】:2020-02-28 13:32:51 【问题描述】:

我的数据如下所示:

metric_date      location  id    value
20/02/07 13:00   ATL       A      34
20/02/07 13:05   ATL       B      12
20/02/07 13:10   ATL       B      02
20/02/07 13:15   ATL       A      15
20/02/07 13:20   ATL       A      00         
20/02/07 13:25   ATL       A      00
20/02/07 13:30   ATL       A      12
20/02/07 13:35   ATL       B      12
20/02/07 13:40   ATL       A      23
20/02/07 13:45   ATL       B      03
20/02/07 13:50   ATL       A      00
20/02/07 13:55   ATL       A      00

我需要找到 max(value) 和 -SUM(value),其中 'id' 是“B”- 在零值列之间的每个部分,以获得 SUM()/MAX() = success_rate

我试过了:

SELECT 
      CASE
       WHEN DATE(metric_date) = lag(DATE(metric_date), 1) OVER (ORDER BY DATE(metric_date)) 
            AND building = lag(building, 1) OVER (ORDER BY date)
       THEN 1
      END AS work_period
    , CASE
        WHEN LAG(value, 1) OVER (ORDER BY date) = 0
             AND LEAD(value, 1) OVER (ORDER BY date) > 0
        THEN LAG(work_period, 1) + 1
        WHEN LAG(SUM(metric_value), 1) OVER (ORDER BY metric_date) > 0
        THEN LAG(work_period, 1)
       END section

我需要这样的结果:

location  section   max   sum   success_rate
ATL         1       34    14    0.4118
ATL         2       23    15    0.6522

【问题讨论】:

编辑您的问题并显示您想要的结果。 并发布您尝试过的内容...... "零值列之间" - between 需要排序顺序。但我在您的示例数据中看不到任何可用于对行进行排序的列,以便术语“between”变得有意义。 还有:什么maxsum 如何获得最大“34”?该值甚至不是数据。 【参考方案1】:

这是Gaps and Islands problem(文章适用于 SQL Server,但同样适用于 postgresql)。

以下应该可以解决您的问题

SELECT  Location,
        MAX(Value) AS Max,
        SUM(CASE WHEN id = 'B' THEN Value END) AS Sum,
        1.0 * SUM(CASE WHEN id = 'B' THEN Value END) / MAX(Value) AS SuccesRate
FROM    (   SELECT  *,
                    ROW_NUMBER() OVER(PARTITION BY Location, CASE WHEN Value = 0 THEN 1 ELSE 0 END ORDER BY metric_date) - 
                        ROW_NUMBER() OVER(PARTITION BY Location ORDER BY metric_date) AS GroupingSet
            FROM    T
        ) AS t
WHERE   Value <> 0
GROUP BY Location, GroupingSet;

关键是生成一个字段来分组以识别岛屿,这可以通过为每一行分配两个 row_number 来完成:

SELECT  *,
        ROW_NUMBER() OVER(PARTITION BY Location, CASE WHEN Value = 0 THEN 1 ELSE 0 END ORDER BY metric_date) AS RowNumInSubset,
        ROW_NUMBER() OVER(PARTITION BY Location ORDER BY metric_date) AS RowNumInSet
FROM    #T
ORDER BY metric_date

这会产生以下内容:

metric_date         location    id      value   RowNumInSubset  RowNumInSet 
----------------------------------------------------------------------------
2020-02-07 13:00    ATL         A       34          1               1       
2020-02-07 13:05    ATL         B       12          2               2       
2020-02-07 13:10    ATL         B       2           3               3       
2020-02-07 13:15    ATL         A       15          4               4       
2020-02-07 13:20    ATL         A       0           1               5       
2020-02-07 13:25    ATL         A       0           2               6       
2020-02-07 13:30    ATL         A       12          5               7       
2020-02-07 13:35    ATL         B       12          6               8       
2020-02-07 13:40    ATL         A       23          7               9       
2020-02-07 13:45    ATL         B       3           8               10      
2020-02-07 13:50    ATL         A       0           3               11      
2020-02-07 13:55    ATL         A       0           4               12      

然后,通过从RowNumInSubset 中减去RowNumInSet,您将为您的islands 生成一个常量:

metric_date         location    id      value   RowNumInSubset  RowNumInSet GroupingSet
------------------------------------------------------------------------------------
2020-02-07 13:00    ATL         A       34          1               1           0
2020-02-07 13:05    ATL         B       12          2               2           0
2020-02-07 13:10    ATL         B       2           3               3           0
2020-02-07 13:15    ATL         A       15          4               4           0
------------------------------------------------------------------------------------
2020-02-07 13:20    ATL         A       0           1               5           -4
2020-02-07 13:25    ATL         A       0           2               6           -4
------------------------------------------------------------------------------------
2020-02-07 13:30    ATL         A       12          5               7           -2
2020-02-07 13:35    ATL         B       12          6               8           -2
2020-02-07 13:40    ATL         A       23          7               9           -2
2020-02-07 13:45    ATL         B       3           8               10          -2
------------------------------------------------------------------------------------
2020-02-07 13:50    ATL         A       0           3               11          -8
2020-02-07 13:55    ATL         A       0           4               12          -8

最后,您可以删除value = 0 所在的行,因为这些只是断点:

metric_date         location    id      value   RowNumInSubset  RowNumInSet GroupingSet
------------------------------------------------------------------------------------
2020-02-07 13:00    ATL         A       34          1               1           0
2020-02-07 13:05    ATL         B       12          2               2           0
2020-02-07 13:10    ATL         B       2           3               3           0
2020-02-07 13:15    ATL         A       15          4               4           0
------------------------------------------------------------------------------------
2020-02-07 13:30    ATL         A       12          5               7           -2
2020-02-07 13:35    ATL         B       12          6               8           -2
2020-02-07 13:40    ATL         A       23          7               9           -2
2020-02-07 13:45    ATL         B       3           8               10          -2

然后您可以对每个组执行聚合。

Example on DB<>Fiddle

【讨论】:

【参考方案2】:

基于问题未指定的几个假设,此查询准确地产生了您想要的结果

SELECT min(location) AS location
     , row_number() OVER (ORDER BY grp) AS section
     , max(value) AS max
     , sum(value) FILTER (WHERE id = 'B') AS sum
     , round(sum(value) FILTER (WHERE id = 'B')
           / max(value)::numeric, 4) AS success_rate
FROM (
   SELECT *, count(*) FILTER (WHERE value = 0) OVER (ORDER BY metric_date) AS grp
   FROM   tbl
   ) sub
WHERE  value <> 0
GROUP  BY grp;

db小提琴here

特别是,不按location 分组 - 这可能是有道理的......

许多相关答案中的详细解释:

Counting null values between dates How to group timestamps into islands (based on arbitrary gap)? Select longest continuous sequence

为了获得最佳性能,请考虑在这种特殊情况下使用程序解决方案(通常,基于集合的解决方案更快),因为这可以通过对表进行单次顺序扫描来解决。喜欢:

GROUP BY and aggregate sequential numeric values

【讨论】:

以上是关于如何获得 STRING 值之间的 MAX() 和 SUM() 值?的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 pymxs 获得 3ds max 中的最小 XYZ 值?

最大值和最小值的递归范围输出

如何在组中获得最小值?

如何从 GROUPs COUNT 中获取 MAX 值

如何获得两个时间值之间的间隔

如何获得向量中的最大值或最小值?