RedShift 中的递归查询

Posted

技术标签:

【中文标题】RedShift 中的递归查询【英文标题】:Recursive query in RedShift 【发布时间】:2018-06-08 22:57:43 【问题描述】:

我对 SQL 和 RedShift 也很陌生。我有两张桌子。

account_usage:

account_id | usage_month |  usage_cost | usage_plan      | usage_type
   1      | 06-01-2018  |   100$      | 2018 - Custom   | dining
   1      | 06-01-2018  |   40$       | 2018 - Standard | office_supply
   2      | 06-01-2018  |   20$       | 2018 - Standard | dining
   2      | 06-01-2018  |   30$       | 2018 - Custom   | office_supply
   3      | 06-01-2018  |   25$       | 2018 - Custom   | dining
   3      | 06-01-2018  |   22$       | 2018 - Standard | office_supply

account_structure:
account_id | account_parent_id | account_name
  1          |  3                | account_1
  2          |  3                | account_2
  3          |  0                | account_3

从这两个表中,我想建立一个聚合表。在此表中,每个 Id 的总使用量将是同一帐户的使用量 + 其所有子帐户的使用量之和。 total_usage_by_type 将是一个 json 字符串,它将按 usage_type 在 json 字符串中累积使用情况。

account_usage_aggregations:
account_id | usage_month |  usage_plan  | total_usage | total_usage_by_type
1          | 06-01-2018  |  2018-Custom   | 100      |"dining":100
1          | 06-01-2018  |  2018-Standard | 40       |"office_supply":40
2          | 06-01-2018  |  2018-Custom   | 30       |"office_supply":30
2          | 06-01-2018  |  2018-Standard | 20       |"dining":20
3          | 06-01-2018  |  2018-Standard | 82       |"office_supply":62 , "dining": 20
3          | 06-01-2018  |  2018-Custom   | 155      |"office_supply":100, "dining": 55

我想在递归查询中解决这个问题并从这个开始

  with C as (
     select account_id,
        usage_type,
        usage_cost,
        account_id as RootID
    from account_usage
    union all
    select account_id,
        usage_type,
        usage_cost,
        C.RootID
    from account_usage
    join C 
    on account_structure.account_parent_id = C.account_id
 ) select * from C;

但是我遇到了以下错误。

 [Amazon](500310) Invalid operation: relation "c" does not exist;

1 条语句失败。

有没有办法在 redShift 中执行递归查询?

【问题讨论】:

据我所知(记得)RedShift 基于没有递归查询的 PostreSQL 8.0(恕我直言,因为它从 2005 年开始就已经过时了)。也许亚马逊添加了一些扩展来支持它们。 redshift 是 postgres 并且比第 8 页有很多增强。但是“递归公用表表达式”在 redshift 中是不可能的。见docs.aws.amazon.com/redshift/latest/dg/…, Invalid operation: WITH RECURSIVE is not supported的可能重复 【参考方案1】:

从 2021 年 4 月 29 日开始,Redshift 现在支持递归 CTE,使用 WITH RECURSIVE 语法:

https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-redshift-announces-support-for-heirarchical-data-queries-with-recursive-cte/ https://docs.aws.amazon.com/redshift/latest/dg/r_WITH_clause.html#r_WITH_clause-recursive-cte
WITH RECURSIVE c AS (
    SELECT account_id, usage_type, usage_cost, account_id AS RootID
    FROM account_usage
    UNION ALL
    SELECT account_id, usage_type, usage_cost, c.RootID
    FROM account_usage
    JOIN c ON account_structure.account_parent_id = c.account_id
          )
SELECT *
FROM C;

【讨论】:

以上是关于RedShift 中的递归查询的主要内容,如果未能解决你的问题,请参考以下文章

递归自加入 Redshift?

oracle 递归 通过子节点查询根节点

详解SQL中的递归问题

递归地将数据从 redshift 卸载到 S3 存储桶

检索递归 SQL 查询中的特定级别

DB for z/OS 中的递归查询