不支持 Redshift 相关子查询模式

Posted

技术标签:

【中文标题】不支持 Redshift 相关子查询模式【英文标题】:Redshift Correlated Subquery Pattern Not Supported 【发布时间】:2017-11-20 18:38:04 【问题描述】:

让我先说我知道这不是一段特别高效或优雅的代码。我正在查询一个名为 INSIDE 的临时表,如下所示:

CREATE TEMP TABLE INSIDE (CONNECT_DATE DATE, DAILY_CONNECTIONS INT);`

然后,我尝试在 INSIDE 上运行以下查询,以测试我一直在研究的模型。

SELECT *
, q5.DAN_PREDICTION - q5.LINEAR_PREDICTION AS PREDICTION_COMPARISON
, q5.DAN_PREDICTION - q5.ACTUAL_MONTH_END_AMOUNT AS DAN_VARIANCE
, q5.LINEAR_PREDICTION - q5.ACTUAL_MONTH_END_AMOUNT AS LINEAR_VARIANCE
FROM (SELECT *
  , q4.mtd + q4.last_yr_remainder + q4.run_rate * q4.days_remaining AS DAN_PREDICTION
  , q4.mtd + q4.curr_yr_7_day * days_remaining AS LINEAR_PREDICTION
    FROM(
SELECT 
       *
      , q3.curr_yr_7_day - q3.last_yr_7_day AS RUN_RATE
      FROM(
        SELECT 
          CONNECT_DATE
        , DAILY_CONNECTIONS 
        , (cur_yr_1_prev + cur_yr_2_prev + cur_yr_3_prev + cur_yr_4_prev + cur_yr_5_prev + cur_yr_6_prev + cur_yr_7_prev)/7 AS CURR_YR_7_DAY
        , (last_yr_1_prev + last_yr_2_prev + last_yr_3_prev + last_yr_4_prev + last_yr_5_prev + last_yr_6_prev + last_yr_7_prev)/7 AS LAST_YR_7_DAY
        , (SELECT ISNULL(SUM(ins.DAILY_CONNECTIONS), 0) 
            FROM INSIDE ins 
            WHERE DATEPART(MONTH, ins.CONNECT_DATE) = DATEPART(MONTH, q2.CONNECT_DATE) 
            AND DATEPART(YEAR, ins.CONNECT_DATE) = DATEPART(YEAR, q2.CONNECT_DATE)
            AND ins.CONNECT_DATE <= q2.CONNECT_DATE) AS MTD
        , (SELECT ISNULL(SUM(ins.DAILY_CONNECTIONS), 0) 
            FROM INSIDE ins 
            WHERE DATEPART(MONTH, ins.CONNECT_DATE) = DATEPART(MONTH, q2.CONNECT_DATE) 
            AND DATEPART(YEAR, ins.CONNECT_DATE) = DATEPART(YEAR, q2.CONNECT_DATE)-1
            AND ins.CONNECT_DATE > DATEADD(YEAR, -1, q2.CONNECT_DATE)) AS LAST_YR_REMAINDER
        , (SELECT TOP 1 DATEPART(DAY, last_day(CONNECT_DATE)) 
            FROM INSIDE 
            WHERE CONNECT_DATE = q2.CONNECT_DATE)-DATEPART(DAY, q2.CONNECT_DATE) DAYS_REMAINING
        , (SELECT ISNULL(SUM(ins.DAILY_CONNECTIONS), 0) 
            FROM INSIDE ins 
            WHERE DATEPART(MONTH, ins.CONNECT_DATE) = DATEPART(MONTH, q2.CONNECT_DATE)
            AND DATEPART(YEAR, ins.CONNECT_DATE) = DATEPART(YEAR, q2.CONNECT_DATE)) AS ACTUAL_MONTH_END_AMOUNT
        FROM
          (SELECT 
            q1.CONNECT_DATE CONNECT_DATE
          , q1.DAILY_CONNECTIONS DAILY_CONNECTIONS
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(DAY,-1,q1.connect_date)), 0) CUR_YR_1_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(DAY,-2,q1.connect_date)), 0) CUR_YR_2_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(DAY,-3,q1.connect_date)), 0) CUR_YR_3_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(DAY,-4,q1.connect_date)), 0) CUR_YR_4_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(DAY,-5,q1.connect_date)), 0) CUR_YR_5_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(DAY,-6,q1.connect_date)), 0) CUR_YR_6_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(DAY,-7,q1.connect_date)), 0) CUR_YR_7_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(YEAR, -1,(DATEADD(DAY,-1,q1.connect_date)))), 0) LAST_YR_1_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(YEAR, -1,(DATEADD(DAY,-2,q1.connect_date)))), 0) LAST_YR_2_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(YEAR, -1,(DATEADD(DAY,-3,q1.connect_date)))), 0) LAST_YR_3_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(YEAR, -1,(DATEADD(DAY,-4,q1.connect_date)))), 0) LAST_YR_4_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(YEAR, -1,(DATEADD(DAY,-5,q1.connect_date)))), 0) LAST_YR_5_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(YEAR, -1,(DATEADD(DAY,-6,q1.connect_date)))), 0) LAST_YR_6_PREV
          , ISNULL((SELECT DAILY_CONNECTIONS FROM INSIDE WHERE CONNECT_DATE = DATEADD(YEAR, -1,(DATEADD(DAY,-7,q1.connect_date)))), 0) LAST_YR_7_PREV
          FROM INSIDE q1 ORDER BY q1.CONNECT_DATE
        ) q2 ORDER BY q2.connect_date
      ) q3  
    ) q4 
  ) q5

运行内部 q1 查询似乎工作得很好;当我在 q2 中运行子查询时,问题就开始了。一次运行其中任何一个以上(MTD、LAST_YR_REMAINDER 等)会产生以下错误:

亚马逊无效操作:由于内部错误,不支持这种类型的关联子查询模式;

我一直在查看 Redshift 中不受支持的子查询类型的文档,但不明白这些违反了哪些规则。任何帮助将不胜感激。

【问题讨论】:

我猜你已经看到了docs.aws.amazon.com/redshift/latest/dg/… 很可能正在命中一些不可用的模式。我认为这可以改写为不同的,也许是更好的方式。请问您能否更新您的问题以包含一些示例数据、您正在做的事情的“逻辑”和预期的输出? connect_date daily_connections: 2016-05-20 867 我通过查找某个日期过去 7 天内的平均连接数与过去一年的 7 天平均值之间的差异来计算年同比运行率。然后,我将相关日期当月的连接数与去年同月的其余连接数相加,再加上运行率乘以从连接日期算起的当月剩余天数。最后一步(q5)只是将结果与一些东西进行比较。 稍后我会看看 - 请使用您在评论中的文字更新您的问题。原因:重要的是让问题尽可能完整,以供其他人遵循,而不必通过 cmets。 【参考方案1】:

您的内联子查询太多。尝试使用公用表表达式 (CTE) 以 Redshift 可以高效运行的方式分解逻辑。

您的大部分内联子查询都可以重写为笛卡尔积的聚合。

WITH cte1 AS (
    SELECT i1.CONNECT_DATE       CONNECT_DATE
          ,i1.DAILY_CONNECTIONS  DAILY_CONNECTIONS
           -- Sub-selects converted to an aggregate over a sparse matrix
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(DAY,  -1, i1.connect_date)                    THEN DAILY_CONNECTIONS ELSE NULL END) CUR_YR_1_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(DAY,  -2, i1.connect_date)                    THEN DAILY_CONNECTIONS ELSE NULL END) CUR_YR_2_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(DAY,  -3, i1.connect_date)                    THEN DAILY_CONNECTIONS ELSE NULL END) CUR_YR_3_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(DAY,  -4, i1.connect_date)                    THEN DAILY_CONNECTIONS ELSE NULL END) CUR_YR_4_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(DAY,  -5, i1.connect_date)                    THEN DAILY_CONNECTIONS ELSE NULL END) CUR_YR_5_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(DAY,  -6, i1.connect_date)                    THEN DAILY_CONNECTIONS ELSE NULL END) CUR_YR_6_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(DAY,  -7, i1.connect_date)                    THEN DAILY_CONNECTIONS ELSE NULL END) CUR_YR_7_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(YEAR, -1, (DATEADD(DAY,-1, i1.connect_date))) THEN DAILY_CONNECTIONS ELSE NULL END) LAST_YR_1_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(YEAR, -1, (DATEADD(DAY,-2, i1.connect_date))) THEN DAILY_CONNECTIONS ELSE NULL END) LAST_YR_2_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(YEAR, -1, (DATEADD(DAY,-3, i1.connect_date))) THEN DAILY_CONNECTIONS ELSE NULL END) LAST_YR_3_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(YEAR, -1, (DATEADD(DAY,-4, i1.connect_date))) THEN DAILY_CONNECTIONS ELSE NULL END) LAST_YR_4_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(YEAR, -1, (DATEADD(DAY,-5, i1.connect_date))) THEN DAILY_CONNECTIONS ELSE NULL END) LAST_YR_5_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(YEAR, -1, (DATEADD(DAY,-6, i1.connect_date))) THEN DAILY_CONNECTIONS ELSE NULL END) LAST_YR_6_PREV
          ,MAX(CASE WHEN i2.CONNECT_DATE = DATEADD(YEAR, -1, (DATEADD(DAY,-7, i1.connect_date))) THEN DAILY_CONNECTIONS ELSE NULL END) LAST_YR_7_PREV
          ,SUM(CASE WHEN DATEPART(MONTH, i2.CONNECT_DATE) = DATEPART(MONTH, i1.CONNECT_DATE) 
                     AND DATEPART(YEAR, i2.CONNECT_DATE) = DATEPART(YEAR, i1.CONNECT_DATE) 
                     AND i2.CONNECT_DATE <= i1.CONNECT_DATE
                    THEN i2.DAILY_CONNECTIONS 
               ELSE NULL END)   AS MTD
          ,SUM(CASE WHEN DATEPART(MONTH, i2.CONNECT_DATE) = DATEPART(MONTH, i1.CONNECT_DATE) 
                     AND DATEPART(YEAR, i2.CONNECT_DATE) = DATEPART(YEAR, i1.CONNECT_DATE)-1
                     AND i2.CONNECT_DATE > DATEADD(YEAR, -1, i1.CONNECT_DATE)
                    THEN i2.DAILY_CONNECTIONS 
               ELSE NULL END)   AS LAST_YR_REMAINDER
          ,MAX(CASE WHEN i2.CONNECT_DATE = i1.CONNECT_DATE-DATEPART(DAY, i1.CONNECT_DATE) 
                    THEN DATEPART(DAY, last_day(CONNECT_DATE)) 
               ELSE NULL END)   AS DAYS_REMAINING
          ,SUM(CASE WHEN DATEPART(MONTH, i2.CONNECT_DATE) = DATEPART(MONTH, i1.CONNECT_DATE)
                     AND DATEPART(YEAR, i2.CONNECT_DATE) = DATEPART(YEAR, i1.CONNECT_DATE)
                    THEN i2.DAILY_CONNECTIONS
               ELSE NULL END)   AS ACTUAL_MONTH_END_AMOUNT
    FROM       INSIDE i1
    -- Create an intentional cartesian product
    CROSS JOIN INSIDE i2
    /*  Consider limiting the cartesian to a specific overlap range. E.g.
    WHERE i2.CONNECT_DATE <= DATEADD(YEAR, -1, (DATEADD(DAY,-7, i1.connect_date)))
    */
    -- Use group by to collapse the cartesian back to the original size
    GROUP BY 1, 2
    ORDER BY 1
), cte2 AS (
    SELECT CONNECT_DATE
         , DAILY_CONNECTIONS 
         , (cur_yr_1_prev + cur_yr_2_prev + cur_yr_3_prev + cur_yr_4_prev + cur_yr_5_prev + cur_yr_6_prev + cur_yr_7_prev)/7 AS CURR_YR_7_DAY
         , (last_yr_1_prev + last_yr_2_prev + last_yr_3_prev + last_yr_4_prev + last_yr_5_prev + last_yr_6_prev + last_yr_7_prev)/7 AS LAST_YR_7_DAY
         , MTD, LAST_YR_REMAINDER, DAYS_REMAINING, ACTUAL_MONTH_END_AMOUNT
    FROM cte1
    ORDER BY connect_date
), cte3 AS (
    SELECT *, curr_yr_7_day - last_yr_7_day AS RUN_RATE
    FROM cte2  
), cte4 AS (
    SELECT *
          , mtd + last_yr_remainder + run_rate * days_remaining AS DAN_PREDICTION
          , mtd + curr_yr_7_day * days_remaining AS LINEAR_PREDICTION
    FROM cte3
)
SELECT *
     , DAN_PREDICTION - LINEAR_PREDICTION AS PREDICTION_COMPARISON
     , DAN_PREDICTION - ACTUAL_MONTH_END_AMOUNT AS DAN_VARIANCE
     , LINEAR_PREDICTION - ACTUAL_MONTH_END_AMOUNT AS LINEAR_VARIANCE
FROM cte4

【讨论】:

我有一个类似的问题here 我在哪里使用 CTE,但它给我一个错误作为相关子查询。我想看看你能不能帮帮我。 请提出一个新问题并提供示例。然后在这里ping,我看看。 :D

以上是关于不支持 Redshift 相关子查询模式的主要内容,如果未能解决你的问题,请参考以下文章

使用 case when 时出现 SQL (Redshift) 错误 - 不支持这种类型的相关子查询模式

为啥 Redshift 不支持 DOES EXIST 相关子查询?

由于内部错误,不支持相关子查询模式的类型

Redshift 中的 DAU WAU MAU 错误:[Amazon](500310) 无效操作:由于内部错误,不支持此类关联子查询模式;

子查询失败:由于内部错误,不支持此类关联子查询模式;

Oracle 到 Redshift 查询