仅当高于 0 时才计算每列的最小值和最大值之间的差异

Posted

技术标签:

【中文标题】仅当高于 0 时才计算每列的最小值和最大值之间的差异【英文标题】:Calculate difference between min and max for each column only if higher then 0 【发布时间】:2021-06-01 23:20:56 【问题描述】:

我需要根据“更新”列中的值计算赔率之间的差异,此时我将更新值为最小值的赔率减去更新值为最大值的赔率。它工作得很好,但我刚刚意识到在某些列中有时恰好为 0,我想知道是否可以根据更新的列选择最小值,并且只选择高于 0 的值。

桌子是这样的

fixture_id H_odds D_odds A_odds ev_tstamp updated
120000 1.40 1.50 1.30 132000 12
120000 1.10 1.10 1.10 132000 11
120000 1.20 0 1.60 132000 10

这就是我想要的回报

fixture_id H_odds D_odds A_odds ev_tstamp updated dif_h dif_d dif_a
120000 1.40 1.50 1.30 132000 12 0.2 0.4 -0.3

这就是我现在要回来的东西

fixture_id H_odds D_odds A_odds ev_tstamp updated dif_h dif_d dif_a
120000 1.40 1.50 1.30 132000 12 0.2 1.5 -0.3

我正在使用的代码

select
   t_max.*,
   (t_max.H_odds - t_min.H_odds) as dif_h,
   (t_max.D_odds - t_min.D_odds) as dif_d,
   (t_max.A_odds - t_min.A_odds) as dif_a
from
(
   select
      fixture_id,
      min(updated) min_updated,
      max(updated) max_updated
  from
      test
  group by
      fixture_id
) as t1
join test as t_min on (t_min.fixture_id = t1.fixture_id and t_min.updated = t1.min_updated)
join test as t_max on (t_max.fixture_id = t1.fixture_id and t_max.updated = t1.max_updated)

【问题讨论】:

MIN(CASE WHEN D_odds > 0 THEN updated END)。如果该行中的值为0,那么它将被 MIN() 忽略。 那么您想要最新赔率,以及相同赛程和结果的最新赔率和最早赔率之间的差异? 是的,我希望在计算差异时跳过 0 赔率,因此每个赔率列必须单独处理,如下面的答案中所述。如果您将在上面示例中的表列 dif_d 中,您可以看到是什么意思 【参考方案1】:

此解决方案仅适用于 mysql 8+。

我会建议窗口函数。下面分别对待每个赔率栏。 . .并且它不会对每次更新时增加或减少的几率做出任何假设:

select fixture_id, ev_tstamp, max(updated),
       max(case when update = max_h_update then h_odds end) as max_h,
       max(case when update = max_d_update then h_odds end) as max_d,
       max(case when update = max_a_update then h_odds end) as max_a,
       (max(case when update = max_h_update then h_odds end) -
        max(case when update = min_h_update then h_odds end)
       ) as h_diff,
       (max(case when update = max_d_update then d_odds end) -
        max(case when update = min_d_update then d_odds end)
       ) as d_diff,
       (max(case when update = max_a_update then a_odds end) -
        max(case when update = min_a_update then a_odds end)
       ) as a_diff
from (select t.*,
             max(case when h_odds <> 0 then update end) over (partition by fixture_id) as max_h_update,
             min(case when h_odds <> 0 then update end) over (partition by fixture_id) as min_h_update,
             max(case when d_odds <> 0 then update end) over (partition by fixture_id) as max_h_update,
             min(case when d_odds <> 0 then update end) over (partition by fixture_id) as min_h_update,
             max(case when a_odds <> 0 then update end) over (partition by fixture_id) as max_a_update,
             min(case when a_odds <> 0 then update end) over (partition by fixture_id) as min_a_update
      from test t
     ) t
group by fixture_id, ev_tstamp;

【讨论】:

抱歉我忘了说我用的是mysql 5.7 会按工作分区吗? 我刚刚尝试过,我得到了错误“.......正确的语法在'附近使用'(分区...。” 窗口函数不能很好地替代(修复)糟糕的架构设计【参考方案2】:

考虑以下几点:

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table 
(fixture_id INT NOT NULL
,updated INT NOT NULL
,outcome ENUM('Home win','Draw','Away win') NOT NULL
,odds DECIMAL(5,2) NOT NULL
,PRIMARY KEY(fixture_id,outcome,updated)
);

INSERT INTO my_table VALUES
(120,12,'Home win',1.40),
(120,11,'Home win',1.10),
(120,10,'Home win',1.20),
(120,12,'Draw',1.50),
(120,11,'Draw',1.10),
(120,12,'Away win',1.30),
(120,11,'Away win',1.10),
(120,10,'Away win',1.60);    

最新赔率:

SELECT x.*
  FROM my_table x
  JOIN
     ( SELECT fixture_id
            , outcome
            , MAX(updated) min_updated
         FROM my_table x
        GROUP 
           BY fixture_id
            , outcome
     ) y
    ON y.fixture_id = x.fixture_id
   AND y.outcome = x.outcome
   AND y.min_updated = x.updated;
   Earliest odds:

最早赔率:

SELECT x.*
  FROM my_table x
  JOIN
     ( SELECT fixture_id
            , outcome
            , MIN(updated) min_updated
         FROM my_table x
        GROUP 
           BY fixture_id
            , outcome
     ) y
    ON y.fixture_id = x.fixture_id
   AND y.outcome = x.outcome
   AND y.min_updated = x.updated;

三角洲:

SELECT a.*
     , a.odds - b.odds delta
  FROM 
     ( SELECT x.*
         FROM my_table x
         JOIN
            ( SELECT fixture_id
                   , outcome
                   , MAX(updated) min_updated
                FROM my_table x
               GROUP 
                  BY fixture_id
                   , outcome
            ) y
           ON y.fixture_id = x.fixture_id
          AND y.outcome = x.outcome
          AND y.min_updated = x.updated
     ) a
 JOIN
    ( SELECT x.*
         FROM my_table x
         JOIN
            ( SELECT fixture_id
                   , outcome
                   , MIN(updated) min_updated
                FROM my_table x
               GROUP 
                  BY fixture_id
                   , outcome
            ) y
           ON y.fixture_id = x.fixture_id
          AND y.outcome = x.outcome
          AND y.min_updated = x.updated
    ) b
   ON b.fixture_id = a.fixture_id
  AND b.outcome = a.outcome;

结果:

    +------------+---------+----------+------+-------+
    | fixture_id | updated | outcome  | odds | delta |
    +------------+---------+----------+------+-------+
    |        120 |      12 | Home win | 1.40 |  0.20 |
    |        120 |      12 | Draw     | 1.50 |  0.40 |
    |        120 |      12 | Away win | 1.30 | -0.30 |
    +------------+---------+----------+------+-------+

【讨论】:

感谢输入,但表格结构不同,每个表格记录包含所有市场的赔率 我强烈建议你修改结构。 我会做...只是其他市场还有另外 20 个列,所以保持原样更方便 相反,修改设计更加方便和高效。如果你突然有 21 个市场怎么办?然后,您必须修改所有结构和所有查询。采用标准化设计,20 个市场或 21 个市场的查询完全相同。 是的,我想你就在这里,我会调查一下【参考方案3】:

我只是稍微修改一下代码,只计算特定组赔率(平均)的差异,所以它看起来像下面这样。虽然它只工作了一次,处理时间超过 15 秒,而其他时间我尝试它由于超时错误而没有工作。只是为了在我的结构中澄清市场列是您示例中的“结果”列。

explain SELECT a.*
    , a.odds - b.odds delta
FROM 
 ( SELECT x.*
     FROM average_odds x
     JOIN
        ( SELECT fix_id
               , market
               , MAX(updated) min_updated
            FROM average_odds x where odds_type=avg
           GROUP BY fix_id
               , market
        ) y
       ON y.fix_id = x.fix_id
      AND y.market = x.market
      AND y.min_updated = x.updated
 ) a
JOIN
( SELECT x.*
     FROM average_odds x
     JOIN
        ( SELECT fix_id
               , market
               , MIN(updated) min_updated
            FROM average_odds x where odds_type=avg
           GROUP BY fix_id
               , market
        ) y
       ON y.fix_id = x.fix_id
      AND y.market = x.market
      AND y.min_updated = x.updated
) b
 ON b.fix_id = a.fix_id
AND b.market = a.market  
ORDER BY `delta` ASC

这是解释表

ID S TYPE table.. parti type pos_keys KEY key len ref rows filtered extra
1 PRIMARY derived3> null all null null null null 17466 100.00 Using temporary; Using filesort
1 PRIMARY x null ref fix,fixi,market,updat fix 4 y.fix_id 596 0.11 Using where
1 PRIMARY x null ref fix,fixi,market,updat fix 4 y.fix_id 596 2.27 Using where
1 PRIMARY derived5> null ref auto_key0> auto_key0> 31 y.fix_id,y.market,bobi.x.updated 10 100.00 using index
5 DERIVED x null ref boki boki 4 const 17466 100.00 Using index condition; Using temporary; Using file...
3 DERIVED x null ref boki boki 4 const 17466 100.00 Using index condition; Using temporary; Using file...

【讨论】:

以上是关于仅当高于 0 时才计算每列的最小值和最大值之间的差异的主要内容,如果未能解决你的问题,请参考以下文章

Java求解! 定义一个6行6列的二维整型数组,输出该二维数组中的每行和每列的最大值、最小值、和平均值。

矩阵列的最小元素

PHP如何取二维数组中的某列的最大值和最小值?

在 Vba 列的范围内查找最小值和最大值

仅当缩放高于某个值时才显示markerOptions 的优化而非弃用方式

如何选择数据表中列的最小值和最大值?