基于 MAX 时间戳的 SQL 内连接

Posted 2023-02-24

技术标签:

【中文标题】基于 MAX 时间戳的 SQL 内连接【英文标题】：SQL Inner Join based on MAX of timestamp 【发布时间】：2016-03-18 02:18:38 【问题描述】：

修改过一次

两次修改：其余 9 个表格的表头除报告外始终称为“what”。

我有大约 10 个表，结构如下：

reports (165k rows)
+-----------+-----------+
| identifier| category  | 
+-----------+-----------+
| 1         | fixed     |
| 2         | wontfix   |
| 3         | fixed     |
| 4         | invalid   | 
| 5         | later     | 
| 6         | wontfix   | 
| 7         | duplicate | 
| 8         | later     | 
| 9         | wontfix   | 
+-----------+-----------+   
 status (300k rows, all identifiers from reports come up at least once)
+-----------+-----------+----------+
| identifier| time      | what     |
+-----------+-----------+----------+
| 1         | 12        | RESOLVED |
| 1         | 9         | NEW      |
| 2         | 7         | ASSIGNED |
| 3         | 10        | RESOLVED |
| 5         | 4         | REOPEN   |
| 7         | 9         | ASSIGNED |
| 4         | 9         | ASSIGNED |
| 7         | 11        | RESOLVED |
| 8         | 3         | NEW      |
| 4         | 3         | NEW      |
| 7         | 6         | NEW      |
+-----------+-----------+----------+

 priority (300k rows, all identifiers from reports come up at least once)
+-----------+-----------+----------+
| identifier| time      | what     |
+-----------+-----------+----------+
| 3         | 12        | LOW      |
| 1         | 9         | LOW      |
| 9         | 2         | HIGH     |
| 8         | 7         | HIGH     |
| 3         | 10        | HIGH     |
| 5         | 4         | MEDIUM   |
| 4         | 9         | MEDIUM   |
| 4         | 3         | LOW      |
| 7         | 9         | LOW      |
| 7         | 11        | HIGH     |
| 8         | 3         | LOW      |
| 6         | 12        | MEDIUM   |
| 7         | 6         | LOW      |
| 6         | 9         | HIGH     |
| 2         | 6         | HIGH     |
| 2         | 1         | LOW      |
+-----------+-----------+----------+

我需要的是：

 reportsfinal (165k rows)
+-----------+-----------+--------------+------------+
| identifier| category  | what11       |  what22    |
+-----------+-----------+--------------+------------+
| 1         | fixed     | RESOLVED     | LOW        |
| 2         | wontfix   | ASSIGNED     | HIGH       |
| 3         | fixed     | RESOLVED     | LOW        |
| 4         | invalid   | ASSIGNED     | MEDIUM     |
| 5         | later     | REOPEN       | MEDIUM     |
| 6         | wontfix   |              | MEDIUM     |
| 7         | duplicate | RESOLVED     | HIGH       |
| 8         | later     | NEW          | HIGH       |
| 9         | wontifx   |              | HIGH       |
+-----------+-----------+--------------+------------+

也就是说，reports（查询后 = reportsfinal）作为基础表，我必须从其他 9 个表中添加一到两列。 identifier 是关键，但在某些表中，identifier 出现多次。在这些情况下，我只想使用时间最长的条目。我尝试了几个查询，但没有一个有效。如果可能的话，我想使用这种方法运行一个查询以从其他 9 个表中获取不同的列。

我根据以下答案尝试了什么：

select  T.identifier,
        T.category,
        t.what AS what11,
        t.what AS what22 from (
     select R.identifier,
     R.category,
     COALESCE(S.what,'NA')what,
     COALESCE(P.what,'NA')what,
     ROW_NUMBER()OVER(partition by R.identifier,R.category ORDER by (select null))RN
     from reports R 
     LEFT JOIN bugstatus S
     ON S.identifier = R.identifier
     LEFT JOIN priority P
     ON P.identifier = s.identifier

     GROUP BY R.identifier,R.category,S.what,P.what)T
     Where T.RN = 1
     ORDER BY T.identifier;

这给出了错误：

Error: near "(": syntax error.

【问题讨论】：

【参考方案1】：

对于每个关联的表，只需使用基于子查询的谓词来识别特定的时间戳...

单字母标记 r、s 和 p 分别是表报告、状态和优先级的定义别名

Select r.Identifier, r.category,
   coalesce(s.what, 'NA') status,
   coalesce(p.what, 'NA') priority
From reports r
  left join status s
     on s.identifier = r.identifier
        and s.time =
           (Select max(time) from status 
            where identifier = r.identifier)
  left join priority p
     on p.identifier = r.identifier
        and p.time =
           (Select max(time) from priority 
            where identifier = r.identifier);

问题：为什么将列从Status 和priority 重命名为What？您不妨将其命名为 something 或 data 或 information。至少最初的名称（status 和 prio）传达了一些信息。What 这个词毫无意义。

注意。我撤消了对what11 和what12 别名的编辑，因为这些名称毫无意义。

【讨论】：

我收到“错误：列名不明确：时间” 我明白你关于标题“什么”的观点。我没有这样定义它们，它们在 csv 文件中是这样的。但你是对的，在创建 SQLite 数据库时，我本可以给它们起更有意义的名称。查询现在已经运行了 4 个小时，但仍未完成。我认为必须有一个查询可以更快地获取数据？@Charles Bretana 外键有索引吗？对不起，我无法回答。这些表格如我的问题中所述。在报告表中，主键是“标识符”。在其他表中，唯一标识符将是“标识符”和“时间”的组合。@PilotBob 如果您可以联系 dbo 的任何人，请向他们询问是否可以在 status.time 和 priority.time 上添加索引。此外，如果它是 SQL Server，或者如果它使用聚集索引的任何数据库，我会在这两个字段的组合上创建主键聚集索引（identifier, time）。（这是 SQL 服务器，或 Oracle 或 mysql ??)【参考方案2】：

基本上你需要选择列表中的相关子查询。

从臀部，类似于：

Select a.Identifier
,a.Category
,(select process
    from status where status.identifier = a.Identifer order by time desc limit 1) Process
,(select prio
    from priority where priorty.identifier = a.Identifer order by time desc limit 1) prio
From Reports a

【讨论】：

@peter：“a”代表什么？是a=报告吗？此外 TOP 1 在 SQLite 中不起作用，我读到 LIMIT 1 起作用。像这样我没有收到错误，但查询仍在运行（我有 300k 行）：'Select reports.identifier,reports.current_resolution,reports.current_status,(select what from bugstatus where bugstatus.identifier = reports.identifier按时间戳顺序排列 LIMIT 1) 什么，(从优先级中选择什么，其中 priority.identifier = reports.identifier 按时间戳顺序排列 LIMIT 1) 从报告中选择什么；` 是的，我说它是从臀部，我使用 T-SQL，我认为你使用 Sqllite.. 所以限制 1 似乎是等价的。是的，我打算将 Reports 别名为 A，但没有。你说它有效还是无效？是的，我正在使用 SQLite。那么，我需要在查询结束时使用“From reports a”吗？我的提案的其余部分是否正确？非常感谢。@PilotBob 是的，这应该使报告...或者您可以在查询的其余部分将其从 a 更改为报告。我认为您的编辑看起来不错。我对其进行了编辑以添加别名并使用 sqlite limit 1 语法。【参考方案3】：

根据您的假设数据使用 Row_number

select  T.identifier,
        T.category,
        what AS what11,
        what AS what22 from (
     select R.identifier,
     r.category,
     COALESCE(S.what,'NA')what,
     COALESCE(P.what,'NA')what,
     ROW_NUMBER()OVER(partition by R.identifier,r.category ORDER by (select null))RN
     from reports R left join status S
     ON S.identifier = R.identifier
     LEFT JOIN Priority P
     ON P.identifier = s.identifier

     GROUP BY R.identifier,r.category,S.what,P.what)T
     Where T.RN = 1
     ORDER BY T.identifier

【讨论】：

T 代表什么或在哪里定义？ "ORDER by (select null))RN" RN 代表什么或它在哪里定义？非常感谢！我已用作派生表，而 T 只不过是别名。 RN 是 Row_number 并选择 Null 用于订购目的@JohnDavison 1) 您可以互换使用大小写字母作为别名和表名。这有关系吗？ 2) 表格中没有明确的“NA”，而是空白。代码是否必须更改？ 3) 我将后续表格状态和优先级的标题更改为“什么”。如果所有标题都被称为“什么”，这是一个问题吗？我收到错误消息：“错误：靠近”（“：语法错误”。@mohan111 NA 只不过是 NULL，您可以将其更改为 null 并出现语法错误，正确给出列名对于初始问题中的新标题，编辑后的代码是否正确？别名是 r 还是 R 有关系吗？ @mohan111

以上是关于基于 MAX 时间戳的 SQL 内连接的主要内容，如果未能解决你的问题，请参考以下文章