如何获得 STRING 值之间的 MAX() 和 SUM() 值?
Posted
技术标签:
【中文标题】如何获得 STRING 值之间的 MAX() 和 SUM() 值?【英文标题】:How do you get the MAX() and SUM() of values between STRING values? 【发布时间】:2020-02-28 13:32:51 【问题描述】:我的数据如下所示:
metric_date location id value
20/02/07 13:00 ATL A 34
20/02/07 13:05 ATL B 12
20/02/07 13:10 ATL B 02
20/02/07 13:15 ATL A 15
20/02/07 13:20 ATL A 00
20/02/07 13:25 ATL A 00
20/02/07 13:30 ATL A 12
20/02/07 13:35 ATL B 12
20/02/07 13:40 ATL A 23
20/02/07 13:45 ATL B 03
20/02/07 13:50 ATL A 00
20/02/07 13:55 ATL A 00
我需要找到 max(value) 和 -SUM(value),其中 'id' 是“B”- 在零值列之间的每个部分,以获得 SUM()/MAX() = success_rate
我试过了:
SELECT
CASE
WHEN DATE(metric_date) = lag(DATE(metric_date), 1) OVER (ORDER BY DATE(metric_date))
AND building = lag(building, 1) OVER (ORDER BY date)
THEN 1
END AS work_period
, CASE
WHEN LAG(value, 1) OVER (ORDER BY date) = 0
AND LEAD(value, 1) OVER (ORDER BY date) > 0
THEN LAG(work_period, 1) + 1
WHEN LAG(SUM(metric_value), 1) OVER (ORDER BY metric_date) > 0
THEN LAG(work_period, 1)
END section
我需要这样的结果:
location section max sum success_rate
ATL 1 34 14 0.4118
ATL 2 23 15 0.6522
【问题讨论】:
编辑您的问题并显示您想要的结果。 并发布您尝试过的内容...... "零值列之间" - between 需要排序顺序。但我在您的示例数据中看不到任何可用于对行进行排序的列,以便术语“between”变得有意义。 还有:什么的max
和sum
?
如何获得最大“34”?该值甚至不是数据。
【参考方案1】:
这是Gaps and Islands problem(文章适用于 SQL Server,但同样适用于 postgresql)。
以下应该可以解决您的问题
SELECT Location,
MAX(Value) AS Max,
SUM(CASE WHEN id = 'B' THEN Value END) AS Sum,
1.0 * SUM(CASE WHEN id = 'B' THEN Value END) / MAX(Value) AS SuccesRate
FROM ( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Location, CASE WHEN Value = 0 THEN 1 ELSE 0 END ORDER BY metric_date) -
ROW_NUMBER() OVER(PARTITION BY Location ORDER BY metric_date) AS GroupingSet
FROM T
) AS t
WHERE Value <> 0
GROUP BY Location, GroupingSet;
关键是生成一个字段来分组以识别岛屿,这可以通过为每一行分配两个 row_number 来完成:
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Location, CASE WHEN Value = 0 THEN 1 ELSE 0 END ORDER BY metric_date) AS RowNumInSubset,
ROW_NUMBER() OVER(PARTITION BY Location ORDER BY metric_date) AS RowNumInSet
FROM #T
ORDER BY metric_date
这会产生以下内容:
metric_date location id value RowNumInSubset RowNumInSet
----------------------------------------------------------------------------
2020-02-07 13:00 ATL A 34 1 1
2020-02-07 13:05 ATL B 12 2 2
2020-02-07 13:10 ATL B 2 3 3
2020-02-07 13:15 ATL A 15 4 4
2020-02-07 13:20 ATL A 0 1 5
2020-02-07 13:25 ATL A 0 2 6
2020-02-07 13:30 ATL A 12 5 7
2020-02-07 13:35 ATL B 12 6 8
2020-02-07 13:40 ATL A 23 7 9
2020-02-07 13:45 ATL B 3 8 10
2020-02-07 13:50 ATL A 0 3 11
2020-02-07 13:55 ATL A 0 4 12
然后,通过从RowNumInSubset
中减去RowNumInSet
,您将为您的islands
生成一个常量:
metric_date location id value RowNumInSubset RowNumInSet GroupingSet
------------------------------------------------------------------------------------
2020-02-07 13:00 ATL A 34 1 1 0
2020-02-07 13:05 ATL B 12 2 2 0
2020-02-07 13:10 ATL B 2 3 3 0
2020-02-07 13:15 ATL A 15 4 4 0
------------------------------------------------------------------------------------
2020-02-07 13:20 ATL A 0 1 5 -4
2020-02-07 13:25 ATL A 0 2 6 -4
------------------------------------------------------------------------------------
2020-02-07 13:30 ATL A 12 5 7 -2
2020-02-07 13:35 ATL B 12 6 8 -2
2020-02-07 13:40 ATL A 23 7 9 -2
2020-02-07 13:45 ATL B 3 8 10 -2
------------------------------------------------------------------------------------
2020-02-07 13:50 ATL A 0 3 11 -8
2020-02-07 13:55 ATL A 0 4 12 -8
最后,您可以删除value = 0
所在的行,因为这些只是断点:
metric_date location id value RowNumInSubset RowNumInSet GroupingSet
------------------------------------------------------------------------------------
2020-02-07 13:00 ATL A 34 1 1 0
2020-02-07 13:05 ATL B 12 2 2 0
2020-02-07 13:10 ATL B 2 3 3 0
2020-02-07 13:15 ATL A 15 4 4 0
------------------------------------------------------------------------------------
2020-02-07 13:30 ATL A 12 5 7 -2
2020-02-07 13:35 ATL B 12 6 8 -2
2020-02-07 13:40 ATL A 23 7 9 -2
2020-02-07 13:45 ATL B 3 8 10 -2
然后您可以对每个组执行聚合。
Example on DB<>Fiddle
【讨论】:
【参考方案2】:基于问题未指定的几个假设,此查询准确地产生了您想要的结果:
SELECT min(location) AS location
, row_number() OVER (ORDER BY grp) AS section
, max(value) AS max
, sum(value) FILTER (WHERE id = 'B') AS sum
, round(sum(value) FILTER (WHERE id = 'B')
/ max(value)::numeric, 4) AS success_rate
FROM (
SELECT *, count(*) FILTER (WHERE value = 0) OVER (ORDER BY metric_date) AS grp
FROM tbl
) sub
WHERE value <> 0
GROUP BY grp;
db小提琴here
特别是,不按location
分组 - 这可能是有道理的......
许多相关答案中的详细解释:
Counting null values between dates How to group timestamps into islands (based on arbitrary gap)? Select longest continuous sequence为了获得最佳性能,请考虑在这种特殊情况下使用程序解决方案(通常,基于集合的解决方案更快),因为这可以通过对表进行单次顺序扫描来解决。喜欢:
GROUP BY and aggregate sequential numeric values【讨论】:
以上是关于如何获得 STRING 值之间的 MAX() 和 SUM() 值?的主要内容,如果未能解决你的问题,请参考以下文章