仅当高于 0 时才计算每列的最小值和最大值之间的差异

Posted 2023-03-17

技术标签:

【中文标题】仅当高于 0 时才计算每列的最小值和最大值之间的差异【英文标题】：Calculate difference between min and max for each column only if higher then 0 【发布时间】：2021-06-01 23:20:56 【问题描述】：

我需要根据“更新”列中的值计算赔率之间的差异，此时我将更新值为最小值的赔率减去更新值为最大值的赔率。它工作得很好，但我刚刚意识到在某些列中有时恰好为 0，我想知道是否可以根据更新的列选择最小值，并且只选择高于 0 的值。

桌子是这样的

fixture_id	H_odds	D_odds	A_odds	ev_tstamp	updated
120000	1.40	1.50	1.30	132000	12
120000	1.10	1.10	1.10	132000	11
120000	1.20	0	1.60	132000	10

这就是我想要的回报

fixture_id	H_odds	D_odds	A_odds	ev_tstamp	updated	dif_h	dif_d	dif_a
120000	1.40	1.50	1.30	132000	12	0.2	0.4	-0.3

这就是我现在要回来的东西

fixture_id	H_odds	D_odds	A_odds	ev_tstamp	updated	dif_h	dif_d	dif_a
120000	1.40	1.50	1.30	132000	12	0.2	1.5	-0.3

我正在使用的代码

select
   t_max.*,
   (t_max.H_odds - t_min.H_odds) as dif_h,
   (t_max.D_odds - t_min.D_odds) as dif_d,
   (t_max.A_odds - t_min.A_odds) as dif_a
from
(
   select
      fixture_id,
      min(updated) min_updated,
      max(updated) max_updated
  from
      test
  group by
      fixture_id
) as t1
join test as t_min on (t_min.fixture_id = t1.fixture_id and t_min.updated = t1.min_updated)
join test as t_max on (t_max.fixture_id = t1.fixture_id and t_max.updated = t1.max_updated)

【问题讨论】：

MIN(CASE WHEN D_odds > 0 THEN updated END)。如果该行中的值为0，那么它将被 MIN() 忽略。那么您想要最新赔率，以及相同赛程和结果的最新赔率和最早赔率之间的差异？是的，我希望在计算差异时跳过 0 赔率，因此每个赔率列必须单独处理，如下面的答案中所述。如果您将在上面示例中的表列 dif_d 中，您可以看到是什么意思 【参考方案1】：

此解决方案仅适用于 mysql 8+。

我会建议窗口函数。下面分别对待每个赔率栏。 . .并且它不会对每次更新时增加或减少的几率做出任何假设：

select fixture_id, ev_tstamp, max(updated),
       max(case when update = max_h_update then h_odds end) as max_h,
       max(case when update = max_d_update then h_odds end) as max_d,
       max(case when update = max_a_update then h_odds end) as max_a,
       (max(case when update = max_h_update then h_odds end) -
        max(case when update = min_h_update then h_odds end)
       ) as h_diff,
       (max(case when update = max_d_update then d_odds end) -
        max(case when update = min_d_update then d_odds end)
       ) as d_diff,
       (max(case when update = max_a_update then a_odds end) -
        max(case when update = min_a_update then a_odds end)
       ) as a_diff
from (select t.*,
             max(case when h_odds <> 0 then update end) over (partition by fixture_id) as max_h_update,
             min(case when h_odds <> 0 then update end) over (partition by fixture_id) as min_h_update,
             max(case when d_odds <> 0 then update end) over (partition by fixture_id) as max_h_update,
             min(case when d_odds <> 0 then update end) over (partition by fixture_id) as min_h_update,
             max(case when a_odds <> 0 then update end) over (partition by fixture_id) as max_a_update,
             min(case when a_odds <> 0 then update end) over (partition by fixture_id) as min_a_update
      from test t
     ) t
group by fixture_id, ev_tstamp;

【讨论】：

抱歉我忘了说我用的是mysql 5.7 会按工作分区吗？我刚刚尝试过，我得到了错误“.......正确的语法在'附近使用'（分区...。” 窗口函数不能很好地替代（修复）糟糕的架构设计【参考方案2】：

考虑以下几点：

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table 
(fixture_id INT NOT NULL
,updated INT NOT NULL
,outcome ENUM('Home win','Draw','Away win') NOT NULL
,odds DECIMAL(5,2) NOT NULL
,PRIMARY KEY(fixture_id,outcome,updated)
);

INSERT INTO my_table VALUES
(120,12,'Home win',1.40),
(120,11,'Home win',1.10),
(120,10,'Home win',1.20),
(120,12,'Draw',1.50),
(120,11,'Draw',1.10),
(120,12,'Away win',1.30),
(120,11,'Away win',1.10),
(120,10,'Away win',1.60);

最新赔率：

SELECT x.*
  FROM my_table x
  JOIN
     ( SELECT fixture_id
            , outcome
            , MAX(updated) min_updated
         FROM my_table x
        GROUP 
           BY fixture_id
            , outcome
     ) y
    ON y.fixture_id = x.fixture_id
   AND y.outcome = x.outcome
   AND y.min_updated = x.updated;
   Earliest odds:

最早赔率：

SELECT x.*
  FROM my_table x
  JOIN
     ( SELECT fixture_id
            , outcome
            , MIN(updated) min_updated
         FROM my_table x
        GROUP 
           BY fixture_id
            , outcome
     ) y
    ON y.fixture_id = x.fixture_id
   AND y.outcome = x.outcome
   AND y.min_updated = x.updated;

三角洲：

SELECT a.*
     , a.odds - b.odds delta
  FROM 
     ( SELECT x.*
         FROM my_table x
         JOIN
            ( SELECT fixture_id
                   , outcome
                   , MAX(updated) min_updated
                FROM my_table x
               GROUP 
                  BY fixture_id
                   , outcome
            ) y
           ON y.fixture_id = x.fixture_id
          AND y.outcome = x.outcome
          AND y.min_updated = x.updated
     ) a
 JOIN
    ( SELECT x.*
         FROM my_table x
         JOIN
            ( SELECT fixture_id
                   , outcome
                   , MIN(updated) min_updated
                FROM my_table x
               GROUP 
                  BY fixture_id
                   , outcome
            ) y
           ON y.fixture_id = x.fixture_id
          AND y.outcome = x.outcome
          AND y.min_updated = x.updated
    ) b
   ON b.fixture_id = a.fixture_id
  AND b.outcome = a.outcome;

结果：

    +------------+---------+----------+------+-------+
    | fixture_id | updated | outcome  | odds | delta |
    +------------+---------+----------+------+-------+
    |        120 |      12 | Home win | 1.40 |  0.20 |
    |        120 |      12 | Draw     | 1.50 |  0.40 |
    |        120 |      12 | Away win | 1.30 | -0.30 |
    +------------+---------+----------+------+-------+

【讨论】：

感谢输入，但表格结构不同，每个表格记录包含所有市场的赔率我强烈建议你修改结构。我会做...只是其他市场还有另外 20 个列，所以保持原样更方便相反，修改设计更加方便和高效。如果你突然有 21 个市场怎么办？然后，您必须修改所有结构和所有查询。采用标准化设计，20 个市场或 21 个市场的查询完全相同。是的，我想你就在这里，我会调查一下【参考方案3】：

我只是稍微修改一下代码，只计算特定组赔率（平均）的差异，所以它看起来像下面这样。虽然它只工作了一次，处理时间超过 15 秒，而其他时间我尝试它由于超时错误而没有工作。只是为了在我的结构中澄清市场列是您示例中的“结果”列。

explain SELECT a.*
    , a.odds - b.odds delta
FROM 
 ( SELECT x.*
     FROM average_odds x
     JOIN
        ( SELECT fix_id
               , market
               , MAX(updated) min_updated
            FROM average_odds x where odds_type=avg
           GROUP BY fix_id
               , market
        ) y
       ON y.fix_id = x.fix_id
      AND y.market = x.market
      AND y.min_updated = x.updated
 ) a
JOIN
( SELECT x.*
     FROM average_odds x
     JOIN
        ( SELECT fix_id
               , market
               , MIN(updated) min_updated
            FROM average_odds x where odds_type=avg
           GROUP BY fix_id
               , market
        ) y
       ON y.fix_id = x.fix_id
      AND y.market = x.market
      AND y.min_updated = x.updated
) b
 ON b.fix_id = a.fix_id
AND b.market = a.market  
ORDER BY `delta` ASC

这是解释表

ID	S TYPE	table..	parti	type	pos_keys	KEY	key len	ref	rows	filtered	extra
1	PRIMARY	derived3>	null	all	null	null	null	null	17466	100.00	Using temporary; Using filesort
1	PRIMARY	x	null	ref	fix,fixi,market,updat	fix	4	y.fix_id	596	0.11	Using where
1	PRIMARY	x	null	ref	fix,fixi,market,updat	fix	4	y.fix_id	596	2.27	Using where
1	PRIMARY	derived5>	null	ref	auto_key0>	auto_key0>	31	y.fix_id,y.market,bobi.x.updated	10	100.00	using index
5	DERIVED	x	null	ref	boki	boki	4	const	17466	100.00	Using index condition; Using temporary; Using file...
3	DERIVED	x	null	ref	boki	boki	4	const	17466	100.00	Using index condition; Using temporary; Using file...

【讨论】：

以上是关于仅当高于 0 时才计算每列的最小值和最大值之间的差异的主要内容，如果未能解决你的问题，请参考以下文章

Java求解！定义一个6行6列的二维整型数组，输出该二维数组中的每行和每列的最大值、最小值、和平均值。

矩阵列的最小元素

PHP如何取二维数组中的某列的最大值和最小值？

在 Vba 列的范围内查找最小值和最大值

仅当缩放高于某个值时才显示markerOptions 的优化而非弃用方式

如何选择数据表中列的最小值和最大值？