如何获取每个 id 的特定行?

Posted

技术标签:

【中文标题】如何获取每个 id 的特定行?【英文标题】:How to get specific rows for each id? 【发布时间】:2021-07-21 13:25:18 【问题描述】:

要了解具有多种状态的业务流程,

我想根据created_at 列获取具有以下规则的行:

第一行状态‘created’

‘missing_info’ 之后的最后一行 ‘created’ (row_no 4)

‘pending’ 的第一行(row_no 5)

‘missing_info’ 之后的最后一行 ‘pending’ (row_no 7)

‘pending’'missing_info' 之后的第一行(row_no 8)

‘successful’ 的最后一行(row_no 10)

下面我突出显示了我要检索的行。

这是DB-FIDDLE上的示例数据

以下是一般流程:已创建 > missing_info > 待处理 > 成功。但也只能是这样:创建 > 成功。

我知道我可以将QUALIFY 与窗口函数一起使用,并且可以得到'created''successful',如下所示。但我不知道如何获得临时状态。我怎样才能达到预期的输出?

created AS(

SELECT *

FROM t1

WHERE status = 'created'

QUALIFY ROW_NUMBER() OVER (PARTITION BY STATUS, id ORDER BY created_at) = 1 )

请注意createdsuccessful 是开始和结束状态,所以输出中只有一行。其他像 missing_infopending 是临时状态,因此可以在所需的输出中包含多个。


编辑:

要了解具有多种状态的业务流程,

我想根据created_at 列获取一些具有以下两条规则的行:

“待定”之前状态“missing_info”的最后一行(第 2 行)

待定(第 3 行)

“待定”之前状态“missing_info”的最后一行(第 5 行)

待定(第 6 行)

示例数据:

WITH t1 AS (

SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:10:00'::timestamp AS created_at UNION ALL

SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:20:00'::timestamp AS created_at UNION ALL

SELECT 'A' AS id, 'pending' AS status, '2021-07-15 11:30:00'::timestamp AS created_at UNION ALL

SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 12:10:00'::timestamp AS created_at UNION ALL

SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 12:20:00'::timestamp AS created_at UNION ALL

SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:30:00'::timestamp AS created_at

    )

SELECT *

FROM t1

期望的输出:

【问题讨论】:

我不明白你的规则和结果。第二和第四应该返回相同的行。 2号和4号是什么意思?这是一般流程:已创建 > 缺失信息 > 待处理 > 成功。但也只能是这样:创建>成功。如前所述,创建和成功是开始和结束状态。另外两个是临时状态。 @kimi 不管规则如何,您似乎都可以使用MATCH_RECOGNIZE,它专门用于查找一组行的模式。 @LukaszSzozda 感谢您的评论!您知道如何通过使用 MATCH_RECOGNIZE 来查找中间行来定义模式吗?如果我的问题不清楚,请告诉我。我尽量让它更简单、更清晰。 @kimi 当然,我提供了一个简单的演示,随意修改模式以满足您的确切需求并扩展措施部分:) 【参考方案1】:

Snowflake 实现了 MATCH_RECOGNIZE,这是在纯 SQL 中查找复杂模式的最简单工具:

识别一组行中的模式匹配。 MATCH_RECOGNIZE 接受一组行(来自表、视图、子查询或其他源)作为输入,并返回该组内给定行模式的所有匹配项。该模式的定义类似于正则表达式。

数据准备:

CREATE OR REPLACE TABLE t
AS
WITH t1 AS (
SELECT 'A' AS id, 'created' AS status, '2021-07-15 10:30:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'created' AS status, '2021-07-15 10:38:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:10:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 11:12:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 13:36:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'missing_info' AS status, '2021-07-15 14:36:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'pending' AS status, '2021-07-15 12:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-07-15 16:05:00'::timestamp AS created_at UNION ALL
SELECT 'A' AS id, 'successful' AS status, '2021-07-15 17:00:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-07-16 10:30:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'created' AS status, '2021-07-16 11:30:00'::timestamp AS created_at UNION ALL
SELECT 'B' AS id, 'successful' AS status, '2021-07-16 12:30:00'::timestamp AS created_at
    )     
SELECT * FROM t1;

查询场景 1:

SELECT *
FROM t
MATCH_RECOGNIZE (
  PARTITION BY ID
  ORDER BY CREATED_AT
  -- MEASURES MATCH_NUMBER() AS m, --LAST/FIRST/CLASSIFIER/...
  ALL ROWS PER MATCH
  PATTERN (c+m+)
  DEFINE
     c AS status='created'
    ,m AS status='missing_info'
    ,p AS status='pending'
    ,s AS status='succesful'
) mr
ORDER BY ID, CREATED_AT;
-- returns rows 1-4

这里的重点是作为 Perl 风格的正则表达式提供的模式。在这里,我们正在寻找由一个或多个“missing_info”完成的一个或多个“create”的模式。

ALL ROWS PER MATCH - 返回所有行,但必要时可以更改为第一行

MEASURES: Specifying Additional Output Columns 可用于提供其他信息,例如 MATCH_NUMBER/MATCH_SEQUENCE_NUMBER/CLASSIFIER 等,具体取决于具体需求。

使用“|”可以在单个查询中提供更多模式(备选):(c+m+|pm+|...)


编辑:

“感谢您的回答!它返回前 4 行。我基本上需要第 1 行和第 4 行。”

一旦确定了组,就可以过滤第一行和最后一行,例如使用QUALIFY。关键是使用我之前提到的MEASURES:

SELECT *
    FROM t
    MATCH_RECOGNIZE (
      PARTITION BY ID
      ORDER BY CREATED_AT
      MEASURES MATCH_NUMBER() AS mn,
               MATCH_SEQUENCE_NUMBER AS msn
      ALL ROWS PER MATCH
      PATTERN (c+m+)
      DEFINE
         c AS status='created'
        ,m AS status='missing_info'
        ,p AS status='pending'
        ,s AS status='succesful'
    ) mr
    QUALIFY (ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn) = 1)
          OR(ROW_NUMBER() OVER(PARTITION BY mn, ID ORDER BY msn DESC)=1)
    ORDER BY ID, CREATED_AT;
    -- returns first and last row by group consisted of ID and MATCH_NUMBER

【讨论】:

感谢您的回答!它返回前 4 行。我基本上需要第一排和第四排。我修改了查询以检索正确的行,但仍然没有成功。我简化了这个问题。你介意你能再看看吗?我只是想获得 2、3、5 和 6 行。 @kimi 这就是我提到措施的原因。确定组后,可以使用QUALIFY 查找第一行和最后一行。我还将问题恢复为原始形式并添加了更新部分。请不要进行就地更新,因为它们可能会使已经提供的输入无效。

以上是关于如何获取每个 id 的特定行?的主要内容,如果未能解决你的问题,请参考以下文章

如何在android中获取特定列表视图项目的ID?

如何从特定行 PHP 的 sum(columns) 函数中获取值

如何获取laravel中每个用户ID的最新行?

如何获取与特定徽章奖励相关的问题或答案的 ID?

如何获取所有行的 QuerySet,每一行都有特定的字段?

在 sqlite 查询中如何获取具有所需值的特定行和列