仅当高于 0 时才计算每列的最小值和最大值之间的差异
Posted
技术标签:
【中文标题】仅当高于 0 时才计算每列的最小值和最大值之间的差异【英文标题】:Calculate difference between min and max for each column only if higher then 0 【发布时间】:2021-06-01 23:20:56 【问题描述】:我需要根据“更新”列中的值计算赔率之间的差异,此时我将更新值为最小值的赔率减去更新值为最大值的赔率。它工作得很好,但我刚刚意识到在某些列中有时恰好为 0,我想知道是否可以根据更新的列选择最小值,并且只选择高于 0 的值。
桌子是这样的
fixture_id | H_odds | D_odds | A_odds | ev_tstamp | updated |
---|---|---|---|---|---|
120000 | 1.40 | 1.50 | 1.30 | 132000 | 12 |
120000 | 1.10 | 1.10 | 1.10 | 132000 | 11 |
120000 | 1.20 | 0 | 1.60 | 132000 | 10 |
这就是我想要的回报
fixture_id | H_odds | D_odds | A_odds | ev_tstamp | updated | dif_h | dif_d | dif_a |
---|---|---|---|---|---|---|---|---|
120000 | 1.40 | 1.50 | 1.30 | 132000 | 12 | 0.2 | 0.4 | -0.3 |
这就是我现在要回来的东西
fixture_id | H_odds | D_odds | A_odds | ev_tstamp | updated | dif_h | dif_d | dif_a |
---|---|---|---|---|---|---|---|---|
120000 | 1.40 | 1.50 | 1.30 | 132000 | 12 | 0.2 | 1.5 | -0.3 |
我正在使用的代码
select
t_max.*,
(t_max.H_odds - t_min.H_odds) as dif_h,
(t_max.D_odds - t_min.D_odds) as dif_d,
(t_max.A_odds - t_min.A_odds) as dif_a
from
(
select
fixture_id,
min(updated) min_updated,
max(updated) max_updated
from
test
group by
fixture_id
) as t1
join test as t_min on (t_min.fixture_id = t1.fixture_id and t_min.updated = t1.min_updated)
join test as t_max on (t_max.fixture_id = t1.fixture_id and t_max.updated = t1.max_updated)
【问题讨论】:
MIN(CASE WHEN D_odds > 0 THEN updated END)
。如果该行中的值为0
,那么它将被 MIN() 忽略。
那么您想要最新赔率,以及相同赛程和结果的最新赔率和最早赔率之间的差异?
是的,我希望在计算差异时跳过 0 赔率,因此每个赔率列必须单独处理,如下面的答案中所述。如果您将在上面示例中的表列 dif_d 中,您可以看到是什么意思
【参考方案1】:
此解决方案仅适用于 mysql 8+。
我会建议窗口函数。下面分别对待每个赔率栏。 . .并且它不会对每次更新时增加或减少的几率做出任何假设:
select fixture_id, ev_tstamp, max(updated),
max(case when update = max_h_update then h_odds end) as max_h,
max(case when update = max_d_update then h_odds end) as max_d,
max(case when update = max_a_update then h_odds end) as max_a,
(max(case when update = max_h_update then h_odds end) -
max(case when update = min_h_update then h_odds end)
) as h_diff,
(max(case when update = max_d_update then d_odds end) -
max(case when update = min_d_update then d_odds end)
) as d_diff,
(max(case when update = max_a_update then a_odds end) -
max(case when update = min_a_update then a_odds end)
) as a_diff
from (select t.*,
max(case when h_odds <> 0 then update end) over (partition by fixture_id) as max_h_update,
min(case when h_odds <> 0 then update end) over (partition by fixture_id) as min_h_update,
max(case when d_odds <> 0 then update end) over (partition by fixture_id) as max_h_update,
min(case when d_odds <> 0 then update end) over (partition by fixture_id) as min_h_update,
max(case when a_odds <> 0 then update end) over (partition by fixture_id) as max_a_update,
min(case when a_odds <> 0 then update end) over (partition by fixture_id) as min_a_update
from test t
) t
group by fixture_id, ev_tstamp;
【讨论】:
抱歉我忘了说我用的是mysql 5.7 会按工作分区吗? 我刚刚尝试过,我得到了错误“.......正确的语法在'附近使用'(分区...。” 窗口函数不能很好地替代(修复)糟糕的架构设计【参考方案2】:考虑以下几点:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(fixture_id INT NOT NULL
,updated INT NOT NULL
,outcome ENUM('Home win','Draw','Away win') NOT NULL
,odds DECIMAL(5,2) NOT NULL
,PRIMARY KEY(fixture_id,outcome,updated)
);
INSERT INTO my_table VALUES
(120,12,'Home win',1.40),
(120,11,'Home win',1.10),
(120,10,'Home win',1.20),
(120,12,'Draw',1.50),
(120,11,'Draw',1.10),
(120,12,'Away win',1.30),
(120,11,'Away win',1.10),
(120,10,'Away win',1.60);
最新赔率:
SELECT x.*
FROM my_table x
JOIN
( SELECT fixture_id
, outcome
, MAX(updated) min_updated
FROM my_table x
GROUP
BY fixture_id
, outcome
) y
ON y.fixture_id = x.fixture_id
AND y.outcome = x.outcome
AND y.min_updated = x.updated;
Earliest odds:
最早赔率:
SELECT x.*
FROM my_table x
JOIN
( SELECT fixture_id
, outcome
, MIN(updated) min_updated
FROM my_table x
GROUP
BY fixture_id
, outcome
) y
ON y.fixture_id = x.fixture_id
AND y.outcome = x.outcome
AND y.min_updated = x.updated;
三角洲:
SELECT a.*
, a.odds - b.odds delta
FROM
( SELECT x.*
FROM my_table x
JOIN
( SELECT fixture_id
, outcome
, MAX(updated) min_updated
FROM my_table x
GROUP
BY fixture_id
, outcome
) y
ON y.fixture_id = x.fixture_id
AND y.outcome = x.outcome
AND y.min_updated = x.updated
) a
JOIN
( SELECT x.*
FROM my_table x
JOIN
( SELECT fixture_id
, outcome
, MIN(updated) min_updated
FROM my_table x
GROUP
BY fixture_id
, outcome
) y
ON y.fixture_id = x.fixture_id
AND y.outcome = x.outcome
AND y.min_updated = x.updated
) b
ON b.fixture_id = a.fixture_id
AND b.outcome = a.outcome;
结果:
+------------+---------+----------+------+-------+
| fixture_id | updated | outcome | odds | delta |
+------------+---------+----------+------+-------+
| 120 | 12 | Home win | 1.40 | 0.20 |
| 120 | 12 | Draw | 1.50 | 0.40 |
| 120 | 12 | Away win | 1.30 | -0.30 |
+------------+---------+----------+------+-------+
【讨论】:
感谢输入,但表格结构不同,每个表格记录包含所有市场的赔率 我强烈建议你修改结构。 我会做...只是其他市场还有另外 20 个列,所以保持原样更方便 相反,修改设计更加方便和高效。如果你突然有 21 个市场怎么办?然后,您必须修改所有结构和所有查询。采用标准化设计,20 个市场或 21 个市场的查询完全相同。 是的,我想你就在这里,我会调查一下【参考方案3】:我只是稍微修改一下代码,只计算特定组赔率(平均)的差异,所以它看起来像下面这样。虽然它只工作了一次,处理时间超过 15 秒,而其他时间我尝试它由于超时错误而没有工作。只是为了在我的结构中澄清市场列是您示例中的“结果”列。
explain SELECT a.*
, a.odds - b.odds delta
FROM
( SELECT x.*
FROM average_odds x
JOIN
( SELECT fix_id
, market
, MAX(updated) min_updated
FROM average_odds x where odds_type=avg
GROUP BY fix_id
, market
) y
ON y.fix_id = x.fix_id
AND y.market = x.market
AND y.min_updated = x.updated
) a
JOIN
( SELECT x.*
FROM average_odds x
JOIN
( SELECT fix_id
, market
, MIN(updated) min_updated
FROM average_odds x where odds_type=avg
GROUP BY fix_id
, market
) y
ON y.fix_id = x.fix_id
AND y.market = x.market
AND y.min_updated = x.updated
) b
ON b.fix_id = a.fix_id
AND b.market = a.market
ORDER BY `delta` ASC
这是解释表
ID | S TYPE | table.. | parti | type | pos_keys | KEY | key len | ref | rows | filtered | extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | derived3> | null | all | null | null | null | null | 17466 | 100.00 | Using temporary; Using filesort |
1 | PRIMARY | x | null | ref | fix,fixi,market,updat | fix | 4 | y.fix_id | 596 | 0.11 | Using where |
1 | PRIMARY | x | null | ref | fix,fixi,market,updat | fix | 4 | y.fix_id | 596 | 2.27 | Using where |
1 | PRIMARY | derived5> | null | ref | auto_key0> | auto_key0> | 31 | y.fix_id,y.market,bobi.x.updated | 10 | 100.00 | using index |
5 | DERIVED | x | null | ref | boki | boki | 4 | const | 17466 | 100.00 | Using index condition; Using temporary; Using file... |
3 | DERIVED | x | null | ref | boki | boki | 4 | const | 17466 | 100.00 | Using index condition; Using temporary; Using file... |
【讨论】:
以上是关于仅当高于 0 时才计算每列的最小值和最大值之间的差异的主要内容,如果未能解决你的问题,请参考以下文章
Java求解! 定义一个6行6列的二维整型数组,输出该二维数组中的每行和每列的最大值、最小值、和平均值。