子查询在获取过去 X 周的数据时出现太多列错误?

Posted

技术标签:

【中文标题】子查询在获取过去 X 周的数据时出现太多列错误?【英文标题】:Subquery has too many columns error while getting data for past X weeks? 【发布时间】:2020-11-24 21:18:50 【问题描述】:

我有以下查询,它为我提供了前一周的数据,如下所示。它返回包含以下列的数据:typeamounttotal 使用内部子查询中使用的 week_number 列为前一周。

select type,
case
WHEN (type = 'PROC1' AND code = 'UIT') THEN 450
WHEN (type = 'PROC1' AND code = 'KJH') THEN 900
WHEN (type = 'PROC2' AND code = 'LOP') THEN 8840
WHEN (type = 'PROC2' AND code = 'AWE') THEN 1490
WHEN (type = 'PROC3' AND code = 'MNH') THEN 1600
WHEN (type = 'PROC3' AND code = 'LKP') THEN 1900
END as amount,
total
from xyz.orders pa
join
(select clientid as clientid, max(version) as version
from xyz.orders where consumerid IN (select distinct entity_id from abc.items
where week_number = extract(week from current_date) - 1
and item_type like '%Ionize - Data%' )
and createdfor ='BLOCK'
and holder='RELAY_FUTURES'
group by clientid) pb on
pa.clientid = pb.clientid and pa.version = pb.version;

以下是我目前使用上述查询返回的上周的输出:

type    amount      total
---------------------------
PROC1    450         1768
PROC1    900         123
PROC1    450         456
PROC2    8840        99897
PROC2    1490        2223
PROC2    8840        9876
PROC3    1900        23456
PROC3    1600        12498
PROC3    1600        28756

在我上面的查询中,我有如下所示的内部子查询,它返回前一周的数据,然后在外部查询中使用它的输出。

select distinct entity_id from abc.items
where week_number = extract(week from current_date) - 1
and item_type like '%Ionize - Data%'

现在我正在尝试找出一种方法来获取过去 6 周(不包括当前周)的数据,并且还按每周分组,所以我认为我们需要修改上面的内部查询,以便它可以给我过去 6 周的数据,然后在外部查询中按每周分组。基本上我想在过去 6 周内为每个type 获得amounttotal,如下所示。不知何故,我还需要在最终输出中添加 week_number 列。

预期输出

week_number     type    amount      total
--------------------------------------------
  46            PROC1    450         1768
  46            PROC1    900         123
  46            PROC1    450         456
  46            PROC2    8840        99897
  46            PROC2    1490        2223
  46            PROC2    8840        9876
  46            PROC3    1900        23456
  46            PROC3    1600        12498
  46            PROC3    1600        28756
  45            PROC1    450         1768
  45            PROC1    900         123
  45            PROC1    450         456
  45            PROC2    8840        99897
  45            PROC2    1490        2223
  45            PROC2    8840        9876
  45            PROC3    1900        23456
  45            PROC3    1600        12498
  45            PROC3    1600        28756
  44            PROC1    450         1768
  44            PROC1    900         123
  44            PROC1    450         456
  44            PROC2    8840        99897
  44            PROC2    1490        2223
  44            PROC2    8840        9876
  44            PROC3    1900        23456
  44            PROC3    1600        12498
  44            PROC3    1600        28756
  43            PROC1    450         1768
  43            PROC1    900         123
  43            PROC1    450         456
  43            PROC2    8840        99897
  43            PROC2    1490        2223
  43            PROC2    8840        9876
  43            PROC3    1900        23456
  43            PROC3    1600        12498
  43            PROC3    1600        28756
  42            PROC1    450         1768
  42            PROC1    900         123
  42            PROC1    450         456
  42            PROC2    8840        99897
  42            PROC2    1490        2223
  42            PROC2    8840        9876
  42            PROC3    1900        23456
  42            PROC3    1600        12498
  42            PROC3    1600        28756
  41            PROC1    450         1768
  41            PROC1    900         123
  41            PROC1    450         456
  41            PROC2    8840        99897
  41            PROC2    1490        2223
  41            PROC2    8840        9876
  41            PROC3    1900        23456
  41            PROC3    1600        12498
  41            PROC3    1600        28756

所以我通过修改内部子查询尝试使用以下查询,但它给了我invalid operation:subquery has too many columns 的错误。知道我在这里做错了什么吗?

select type,
case
WHEN (type = 'PROC1' AND code = 'UIT') THEN 450
WHEN (type = 'PROC1' AND code = 'KJH') THEN 900
WHEN (type = 'PROC2' AND code = 'LOP') THEN 8840
WHEN (type = 'PROC2' AND code = 'AWE') THEN 1490
WHEN (type = 'PROC3' AND code = 'MNH') THEN 1600
WHEN (type = 'PROC3' AND code = 'LKP') THEN 1900
END as amount,
total
from xyz.orders pa
join
(select clientid as clientid, max(version) as version
from xyz.orders where consumerid IN (select week_number, entity_id from abc.items
where week_number >= extract(week from current_date) - 6 
  and week_number <= extract(week from current_date) - 1
  and item_type like '%Ionize - Data%'
order by week_number desc )
and createdfor ='BLOCK'
and holder='RELAY_FUTURES'
group by clientid) pb on
pa.clientid = pb.clientid and pa.version = pb.version;

我将我的内部查询修改为:

select week_number, entity_id from abc.items
where week_number >= extract(week from current_date) - 6 
  and week_number <= extract(week from current_date) - 1
  and item_type like '%Ionize - Data%'
order by week_number desc

我可能完全错误地实现了所需的输出,因此我们将不胜感激。

【问题讨论】:

这能回答你的问题吗? How to get data for the past x weeks for each type? 不幸的是不是这样,我通过更改内部子查询走了不同的路线,但我在这里收到错误。 在子查询中删除 order by。您不能在子查询中进行排序。 我尝试删除 order by 但仍然出现同样的错误 - invalid operation:subquery has too many columns @SaiAbhiramInapala 在您的原始查询中,当您使用 week_number >= extract(week from current_date) 时会发生什么 - 6. 它有效吗? 【参考方案1】:

安迪,问题在于 WHERE 子句

where consumerid IN (select week_number, entity_id from abc.items
where week_number >= extract(week from current_date) - 6 
  and week_number <= extract(week from current_date) - 1
  and item_type like '%Ionize - Data%'
order by week_number desc )

由于子句正在查看“consumerid”是否是子查询产生的列表的成员。但是,此子查询返回 2 列,Redshift 无法将 1 列与 2 列进行比较。只需从选择中删除“week_number”即可解决此错误。

where consumerid IN (select entity_id from abc.items
where week_number >= extract(week from current_date) - 6 
  and week_number <= extract(week from current_date) - 1
  and item_type like '%Ionize - Data%'
order by week_number desc )

【讨论】:

谢谢比尔,但这会帮助我获得查询所需的输出吗?正如您在上面看到的,我有Expected Output 我还需要做什么才能获得这种格式?我需要按周分组数据。进行该更改后,我可以正常运行查询,但我对如何按周获得最终输出组感到困惑。 这也是多周条件失败的地方,因为目前是一月,本周是 2,所以它会是负数,对吗?有什么方法可以使用 week_number 列来获得过去 6 周的正确数据? Andy,如果没有一些源数据(示例输入数据和该数据的预期输出),将很难理解这个查询是如何工作的。我看到的一个问题是,当周数在新的一年翻转时,“extract(week from current_date) - 6”不会像您预期的那样工作。正如我所提到的,我确实咨询过如何让 Redshift 为人们服务。发布的答案解决了该主题的问题 - 如果需要进行一般查询工作,我建议引入所需的人才。 谢谢比尔。让我更深入地研究这个问题,看看我是否能弄清楚。感谢您的帮助!【参考方案2】:

下面的代码应该足够了 - 但是没有使用 Redshift 解释器进行测试。

week_number:将 IN 替换为 join 并选择您的 week_number,如下所示。 您在此处查找的 week_number 并不完全清楚。对局外人来说,最大版本是检索的 5 周时间盒装周期,因此如果 xyz.orders 上的记录有第 45 周,那么在下面的查询中检索到的最大版本将是 第 39 周 - 第 44 周。 所以下面的查询应该显示以下内容:

Week 45, 
Type, 
Amount for latest in Week 39 - Week 44, 
Total. 

负周数 - 请参阅下面的注释代码。

With Maxfor5WeekPeriod as 
(
    select 
        MaxVersions.week_number,
        O.clientid as clientid, 
        max(O.version) as version
    from xyz.orders O
    join
    (
        select 
            entity_id,
            week_number
        from abc.items
        where 
        --Something like this should resolve your start of year issue. 
        date_part(w,dateadd(week,-6,current_date)) >= week_number 
        and date_part(w,dateadd(week,-1,current_date)) <= week_number
        and item_type like '%Ionize - Data%'
    ) MaxVersions on 
    entity_id = Consumerid 
    and O.createdfor ='BLOCK'
    and O.holder='RELAY_FUTURES'
    group by O.clientid,MaxVersions.week_number
)
select 
    M.week_number, 
    pa.type,
    case
    WHEN (pa.type = 'PROC1' AND contractdomicilecode = 'UIT') THEN 450
    WHEN (pa.type = 'PROC1' AND contractdomicilecode = 'KJH') THEN 900
    WHEN (pa.type = 'PROC2' AND contractdomicilecode = 'LOP') THEN 8840
    WHEN (pa.type = 'PROC2' AND contractdomicilecode = 'AWE') THEN 1490
    WHEN (pa.type = 'PROC3' AND contractdomicilecode = 'MNH') THEN 1600
    WHEN (pa.type = 'PROC3' AND contractdomicilecode = 'LKP') THEN 1900
    END as amount,
    pa.total
from 
xyz.orders pa
join Maxfor5WeekPeriod M 
on pa.clientid = M.clientid and 
pa.version = M.version

一般要点:Redshift 文档包含对 Common Table Expressions 的引用 - 它们会使此代码更加整洁。答案确实从其他贡献者的早期澄清中受益匪浅。但是,请注意我不熟悉 Redshift。

【讨论】:

我尝试了你的建议,但它给了我语法错误,因为 week_number 列仅在 abc.items 中可用,但在 xyz.orders 中不可用,因此它抱怨为 Invalid operation: column o.week_number does not exist; 另外,如果您查看我在顶部问题中的第一个查询 - 它仅适用于前一周的一周,因此它生成的输出仅适用于上周。现在我只需要以这样的方式修改该内部查询,以便我可以获得过去 6 周的数据,但每周分组。这意味着我想看到一个视图,其中我有 week_number 列从最里面的查询到顶部以及该周的所有数据,就像它在问题中的第一个查询中显示的那样。 将 O.week_number 替换为 MaxVersions.week_number,删除“和 MaxVersions.WeekNumber = O.week_number” 知道了,我们还需要在 group by 子句中将这个 O.week_number 更改得太正确吗? 由于某种原因,当我手动运行我的原始查询一周时,数据不匹配,然后如果我运行您的建议并尝试比较它,它不匹配。任何想法为什么会这样?您认为这与使用 join 而不是原始 IN 子句查询有关吗?【参考方案3】:

获取过去 6 周内所有不同的 entity_id、week_number 组合。(表 1)

SELECT DISTINCT entity_id, week_number
       FROM abc.items
       WHERE week_number >= Extract(week FROM CURRENT_DATE) - 6
         AND week_number <= Extract(week FROM CURRENT_DATE) - 1
         AND item_type LIKE '%Ionize - Data%'

获取clientId和Consumer ID的唯一组合对应的最大订单版本

SELECT clientid AS clientid, Max(version) AS version, consumerid
          FROM xyz.orders
          WHERE createdfor = 'BLOCK'
            AND holder = 'RELAY_FUTURES'
          GROUP BY consumerid, clientid

将上面两张表连接起来,得到clientId、version、Week Number。

select xo.clientid    AS clientid,
             xo.version     AS version,
             ai.week_number as week_number from (SELECT DISTINCT entity_id, week_number
       FROM abc.items
       WHERE week_number >= Extract(week FROM CURRENT_DATE) - 6
         AND week_number <= Extract(week FROM CURRENT_DATE) - 1
         AND item_type LIKE '%Ionize - Data%')
         ai LEFT JOIN
         (SELECT clientid     AS clientid,
                 Max(version) AS version,
                 consumerid
          FROM xyz.orders
          WHERE createdfor = 'BLOCK'
            AND holder = 'RELAY_FUTURES'
          GROUP BY consumerid, clientid) xo
      ON ai.entity_id = xo.consumerid

左连接用于丢弃表1中不存在的所有消费者ID。

使用上面的查询作为内部查询。

SELECT pb.week_number,
       pa.type,
       CASE
           WHEN (type = 'PROC1'
               AND contractdomicilecode = 'UIT') THEN 450
           WHEN (type = 'PROC1'
               AND contractdomicilecode = 'KJH') THEN 900
           WHEN (type = 'PROC2'
               AND contractdomicilecode = 'LOP') THEN 8840
           WHEN (type = 'PROC2'
               AND contractdomicilecode = 'AWE') THEN 1490
           WHEN (type = 'PROC3'
               AND contractdomicilecode = 'MNH') THEN 1600
           WHEN (type = 'PROC3'
               AND contractdomicilecode = 'LKP') THEN 1900
           END AS amount,
       pa.total
FROM xyz.orders pa
         JOIN
     (select xo.clientid    AS clientid,
             xo.version     AS version,
             ai.week_number as week_number from (SELECT DISTINCT entity_id, week_number
       FROM abc.items
       WHERE week_number >= Extract(week FROM CURRENT_DATE) - 6
         AND week_number <= Extract(week FROM CURRENT_DATE) - 1
         AND item_type LIKE '%Ionize - Data%')
         ai LEFT JOIN
         (SELECT clientid     AS clientid,
                 Max(version) AS version,
                 consumerid
          FROM xyz.orders
          WHERE createdfor = 'BLOCK'
            AND holder = 'RELAY_FUTURES'
          GROUP BY consumerid, clientid) xo
      ON ai.entity_id = xo.consumerid) pb
     ON pa.clientid = pb.clientid
         AND pa.version = pb.version;

【讨论】:

以上是关于子查询在获取过去 X 周的数据时出现太多列错误?的主要内容,如果未能解决你的问题,请参考以下文章

如何获取每种类型过去 x 周的数据?

如何在子查询中实现多列过滤

加入两个包含 SUM() 函数的子查询时出现无效操作错误

Hibernate:在子选择查询中使用 IN 子句时出现错误

访问核心数据中的子对象时出现关系错误

尝试子查询时出现不明确的列名错误