用于基于三个参数查找唯一行的 SQL 查询 - 类似于“在已排序的分组集中获取第一行”
Posted
技术标签:
【中文标题】用于基于三个参数查找唯一行的 SQL 查询 - 类似于“在已排序的分组集中获取第一行”【英文标题】:SQL query for finding unique rows based on three parameters - kind of "get first row in sorted grouped set" 【发布时间】:2021-07-17 01:36:31 【问题描述】:我正在尝试查看是否有一种方法可以使用 SQL 根据三个参数来查找唯一的分组行。这有点像在一个特殊的排序集中获取每个 group-by
键的第一行。
注意:我被困在 mysql 5.7 上。
这是我的测试表和数据:
CREATE TABLE observations (
id int(10) AUTO_INCREMENT,
area_code varchar(5),
observation_date timestamp,
reading int(10),
source varchar(10),
deleted_at timestamp NULL DEFAULT NULL,
PRIMARY KEY (id)
);
INSERT INTO observations (area_code,observation_date, reading, source, deleted_at)
VALUES
('test1', '2021-01-01', 7, 'auto', null),
('test1', '2021-01-02', 6, 'auto', null),
('test1', '2021-01-03', 5, 'auto', null),
('test2', '2021-01-01', 7, 'auto', null),
('test2', '2021-01-02', 6, 'manual', null),
('test2', '2021-01-03', 5, 'auto', null),
('test3', '2021-01-01', 7, 'auto', null),
('test3', '2021-01-02', 6, 'manual', '2021-01-02'),
('test3', '2021-01-03', 5, 'auto', null);
source
是 auto
或 manual
有多个领域 - 对于每个领域,我都希望获得基于 observation_date
的最新阅读,但前提是 source
是 auto
。如果source
是manual
,那么这将具有优先权 - 并且应该始终作为该区域的读数返回。但是,如果设置了 deleted_at
(仅适用于 manual
),则应忽略 manual
source
- 并且 observation_date
再次成为主要标准。
所以这三个参数是:observation_date
、source
和 deleted_at
- 保存所有内容是为了保存历史记录。
这是我当前的查询,带有实际输出,然后是预期输出:
当前查询尝试:
SELECT obs1.*
FROM observations AS obs1
LEFT JOIN observations AS obs2 ON
obs1.area_code = obs2.area_code AND
obs1.id != obs2.id AND
NOT (
(obs1.source = "manual"
AND obs1.deleted_at IS NULL
)
OR
(obs1.observation_date > obs2.observation_date AND obs2.source = "auto" )
)
WHERE obs2.id IS NULL
实际输出:
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
实际输出(删除AND obs1.deleted_at IS NULL
):
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
8 test3 2021-01-02 00:00:00 6 manual 2021-01-02 00:00:00
预期输出:
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
8 test3 2021-01-03 00:00:00 5 auto NULL
我已经尝试了许多不同的查询 - 但没有一个提供预期的结果。
这有可能吗?还是我做错了?
【问题讨论】:
【参考方案1】:首先,预期的结果应该包含 id 9,而不是您指定的 id 8,因为 id 8 是手动删除的。 所以预期的结果是
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
9 test3 2021-01-03 00:00:00 5 auto NULL
如果你在没有 WHERE 条件的情况下运行它并选择 obs2.* 行
SELECT obs1.*, obs2.*
FROM observations AS obs1
LEFT JOIN observations AS obs2 ON
obs1.area_code = obs2.area_code AND
obs1.id != obs2.id AND
NOT (
(obs1.source = "manual"
AND obs1.deleted_at IS NULL
)
OR
(obs1.observation_date > obs2.observation_date AND obs2.source = "auto" )
)
WHERE 1 OR obs2.id IS NULL
你会看到结果包含
9 test3 2021-01-03T00:00:00Z 5 auto (null) 8 test3 2021-01-02T00:00:00Z 6 manual 2021-01-02T00:00:00Z
所以问题是你没有考虑到obs2.source = 'manual'
。
SELECT obs1.*
FROM observations AS obs1
LEFT JOIN observations AS obs2 ON
obs1.area_code = obs2.area_code AND
obs1.id != obs2.id AND
NOT (
(obs1.source = "manual" AND obs1.deleted_at IS NULL) OR
(obs2.source = 'manual' AND obs2.deleted_at IS NOT NULL) OR
(obs1.observation_date > obs2.observation_date AND obs2.source = "auto")
)
WHERE obs2.id IS NULL
另见http://sqlfiddle.com/#!9/dc675e/13/0
【讨论】:
【参考方案2】:一切皆有可能
让我们根据您给出的逻辑对行进行编号:
SELECT *,
ROW_NUMBER() OVER(PARTITION BY area_code ORDER BY
CASE
WHEN source = 'manual' and deleted_at IS NULL THEN 0 --priority
WHEN source = 'manual' and deleted_at IS NOT NULL THEN 2 --not priority
ELSE 1 --auto
END,
observation_date DESC
) as rown
FROM
obervations
然后只取rown=1的行:
WITH cte AS(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY area_code ORDER BY
CASE
WHEN source = 'manual' and deleted_at IS NULL THEN 0 --priority
WHEN source = 'manual' and deleted_at IS NOT NULL THEN 2 --not priority
ELSE 1 --auto
END,
observation_date DESC
) as rown
FROM
obervations
)
SELECT * FROM cte WHERE rown = 1
行号根据 PARTITION BY 中指定的列的唯一组合将结果集分成组,然后按照 ORDER BY 中设置的排序子句的顺序分配一个递增的数字。
上面的这个逻辑将您所有的手动 obs 排序为前导 (0) 并将删除的手动 obs 排序为尾随 (2),自动进入位置 1,然后如果应用倍数,则 obs 日期降序(最新)用作决胜局
【讨论】:
谢谢,但很抱歉我忘了提到我被困在 mysql 5.7 上 - 所以 OVER 和 PARTITION BY 不是一个选项。【参考方案3】:这是您在旧版本的 MySQL 中对相关子查询执行的操作类型:
select o.*
from observations o
where o.id = (select o2.id
from observations o2
where o2.area_code = o.area_code and
o2.deleted_at is null
order by (o2.source = 'manual') desc,
o2.observation_date desc
limit 1
);
【讨论】:
以上是关于用于基于三个参数查找唯一行的 SQL 查询 - 类似于“在已排序的分组集中获取第一行”的主要内容,如果未能解决你的问题,请参考以下文章