子查询在获取过去 X 周的数据时出现太多列错误?
Posted
技术标签:
【中文标题】子查询在获取过去 X 周的数据时出现太多列错误?【英文标题】:Subquery has too many columns error while getting data for past X weeks? 【发布时间】:2020-11-24 21:18:50 【问题描述】:我有以下查询,它为我提供了前一周的数据,如下所示。它返回包含以下列的数据:type
、amount
和 total
使用内部子查询中使用的 week_number
列为前一周。
select type,
case
WHEN (type = 'PROC1' AND code = 'UIT') THEN 450
WHEN (type = 'PROC1' AND code = 'KJH') THEN 900
WHEN (type = 'PROC2' AND code = 'LOP') THEN 8840
WHEN (type = 'PROC2' AND code = 'AWE') THEN 1490
WHEN (type = 'PROC3' AND code = 'MNH') THEN 1600
WHEN (type = 'PROC3' AND code = 'LKP') THEN 1900
END as amount,
total
from xyz.orders pa
join
(select clientid as clientid, max(version) as version
from xyz.orders where consumerid IN (select distinct entity_id from abc.items
where week_number = extract(week from current_date) - 1
and item_type like '%Ionize - Data%' )
and createdfor ='BLOCK'
and holder='RELAY_FUTURES'
group by clientid) pb on
pa.clientid = pb.clientid and pa.version = pb.version;
以下是我目前使用上述查询返回的上周的输出:
type amount total
---------------------------
PROC1 450 1768
PROC1 900 123
PROC1 450 456
PROC2 8840 99897
PROC2 1490 2223
PROC2 8840 9876
PROC3 1900 23456
PROC3 1600 12498
PROC3 1600 28756
在我上面的查询中,我有如下所示的内部子查询,它返回前一周的数据,然后在外部查询中使用它的输出。
select distinct entity_id from abc.items
where week_number = extract(week from current_date) - 1
and item_type like '%Ionize - Data%'
现在我正在尝试找出一种方法来获取过去 6 周(不包括当前周)的数据,并且还按每周分组,所以我认为我们需要修改上面的内部查询,以便它可以给我过去 6 周的数据,然后在外部查询中按每周分组。基本上我想在过去 6 周内为每个type
获得amount
、total
,如下所示。不知何故,我还需要在最终输出中添加 week_number
列。
预期输出
week_number type amount total
--------------------------------------------
46 PROC1 450 1768
46 PROC1 900 123
46 PROC1 450 456
46 PROC2 8840 99897
46 PROC2 1490 2223
46 PROC2 8840 9876
46 PROC3 1900 23456
46 PROC3 1600 12498
46 PROC3 1600 28756
45 PROC1 450 1768
45 PROC1 900 123
45 PROC1 450 456
45 PROC2 8840 99897
45 PROC2 1490 2223
45 PROC2 8840 9876
45 PROC3 1900 23456
45 PROC3 1600 12498
45 PROC3 1600 28756
44 PROC1 450 1768
44 PROC1 900 123
44 PROC1 450 456
44 PROC2 8840 99897
44 PROC2 1490 2223
44 PROC2 8840 9876
44 PROC3 1900 23456
44 PROC3 1600 12498
44 PROC3 1600 28756
43 PROC1 450 1768
43 PROC1 900 123
43 PROC1 450 456
43 PROC2 8840 99897
43 PROC2 1490 2223
43 PROC2 8840 9876
43 PROC3 1900 23456
43 PROC3 1600 12498
43 PROC3 1600 28756
42 PROC1 450 1768
42 PROC1 900 123
42 PROC1 450 456
42 PROC2 8840 99897
42 PROC2 1490 2223
42 PROC2 8840 9876
42 PROC3 1900 23456
42 PROC3 1600 12498
42 PROC3 1600 28756
41 PROC1 450 1768
41 PROC1 900 123
41 PROC1 450 456
41 PROC2 8840 99897
41 PROC2 1490 2223
41 PROC2 8840 9876
41 PROC3 1900 23456
41 PROC3 1600 12498
41 PROC3 1600 28756
所以我通过修改内部子查询尝试使用以下查询,但它给了我invalid operation:subquery has too many columns
的错误。知道我在这里做错了什么吗?
select type,
case
WHEN (type = 'PROC1' AND code = 'UIT') THEN 450
WHEN (type = 'PROC1' AND code = 'KJH') THEN 900
WHEN (type = 'PROC2' AND code = 'LOP') THEN 8840
WHEN (type = 'PROC2' AND code = 'AWE') THEN 1490
WHEN (type = 'PROC3' AND code = 'MNH') THEN 1600
WHEN (type = 'PROC3' AND code = 'LKP') THEN 1900
END as amount,
total
from xyz.orders pa
join
(select clientid as clientid, max(version) as version
from xyz.orders where consumerid IN (select week_number, entity_id from abc.items
where week_number >= extract(week from current_date) - 6
and week_number <= extract(week from current_date) - 1
and item_type like '%Ionize - Data%'
order by week_number desc )
and createdfor ='BLOCK'
and holder='RELAY_FUTURES'
group by clientid) pb on
pa.clientid = pb.clientid and pa.version = pb.version;
我将我的内部查询修改为:
select week_number, entity_id from abc.items
where week_number >= extract(week from current_date) - 6
and week_number <= extract(week from current_date) - 1
and item_type like '%Ionize - Data%'
order by week_number desc
我可能完全错误地实现了所需的输出,因此我们将不胜感激。
【问题讨论】:
这能回答你的问题吗? How to get data for the past x weeks for each type? 不幸的是不是这样,我通过更改内部子查询走了不同的路线,但我在这里收到错误。 在子查询中删除 order by。您不能在子查询中进行排序。 我尝试删除order by
但仍然出现同样的错误 - invalid operation:subquery has too many columns
@SaiAbhiramInapala
在您的原始查询中,当您使用 week_number >= extract(week from current_date) 时会发生什么 - 6. 它有效吗?
【参考方案1】:
安迪,问题在于 WHERE 子句
where consumerid IN (select week_number, entity_id from abc.items
where week_number >= extract(week from current_date) - 6
and week_number <= extract(week from current_date) - 1
and item_type like '%Ionize - Data%'
order by week_number desc )
由于子句正在查看“consumerid”是否是子查询产生的列表的成员。但是,此子查询返回 2 列,Redshift 无法将 1 列与 2 列进行比较。只需从选择中删除“week_number”即可解决此错误。
where consumerid IN (select entity_id from abc.items
where week_number >= extract(week from current_date) - 6
and week_number <= extract(week from current_date) - 1
and item_type like '%Ionize - Data%'
order by week_number desc )
【讨论】:
谢谢比尔,但这会帮助我获得查询所需的输出吗?正如您在上面看到的,我有Expected Output
我还需要做什么才能获得这种格式?我需要按周分组数据。进行该更改后,我可以正常运行查询,但我对如何按周获得最终输出组感到困惑。
这也是多周条件失败的地方,因为目前是一月,本周是 2,所以它会是负数,对吗?有什么方法可以使用 week_number 列来获得过去 6 周的正确数据?
Andy,如果没有一些源数据(示例输入数据和该数据的预期输出),将很难理解这个查询是如何工作的。我看到的一个问题是,当周数在新的一年翻转时,“extract(week from current_date) - 6”不会像您预期的那样工作。正如我所提到的,我确实咨询过如何让 Redshift 为人们服务。发布的答案解决了该主题的问题 - 如果需要进行一般查询工作,我建议引入所需的人才。
谢谢比尔。让我更深入地研究这个问题,看看我是否能弄清楚。感谢您的帮助!【参考方案2】:
下面的代码应该足够了 - 但是没有使用 Redshift 解释器进行测试。
week_number:将 IN 替换为 join 并选择您的 week_number,如下所示。 您在此处查找的 week_number 并不完全清楚。对局外人来说,最大版本是检索的 5 周时间盒装周期,因此如果 xyz.orders 上的记录有第 45 周,那么在下面的查询中检索到的最大版本将是 第 39 周 - 第 44 周。 所以下面的查询应该显示以下内容:
Week 45,
Type,
Amount for latest in Week 39 - Week 44,
Total.
负周数 - 请参阅下面的注释代码。
With Maxfor5WeekPeriod as
(
select
MaxVersions.week_number,
O.clientid as clientid,
max(O.version) as version
from xyz.orders O
join
(
select
entity_id,
week_number
from abc.items
where
--Something like this should resolve your start of year issue.
date_part(w,dateadd(week,-6,current_date)) >= week_number
and date_part(w,dateadd(week,-1,current_date)) <= week_number
and item_type like '%Ionize - Data%'
) MaxVersions on
entity_id = Consumerid
and O.createdfor ='BLOCK'
and O.holder='RELAY_FUTURES'
group by O.clientid,MaxVersions.week_number
)
select
M.week_number,
pa.type,
case
WHEN (pa.type = 'PROC1' AND contractdomicilecode = 'UIT') THEN 450
WHEN (pa.type = 'PROC1' AND contractdomicilecode = 'KJH') THEN 900
WHEN (pa.type = 'PROC2' AND contractdomicilecode = 'LOP') THEN 8840
WHEN (pa.type = 'PROC2' AND contractdomicilecode = 'AWE') THEN 1490
WHEN (pa.type = 'PROC3' AND contractdomicilecode = 'MNH') THEN 1600
WHEN (pa.type = 'PROC3' AND contractdomicilecode = 'LKP') THEN 1900
END as amount,
pa.total
from
xyz.orders pa
join Maxfor5WeekPeriod M
on pa.clientid = M.clientid and
pa.version = M.version
一般要点:Redshift 文档包含对 Common Table Expressions 的引用 - 它们会使此代码更加整洁。答案确实从其他贡献者的早期澄清中受益匪浅。但是,请注意我不熟悉 Redshift。
【讨论】:
我尝试了你的建议,但它给了我语法错误,因为week_number
列仅在 abc.items
中可用,但在 xyz.orders
中不可用,因此它抱怨为 Invalid operation: column o.week_number does not exist;
。
另外,如果您查看我在顶部问题中的第一个查询 - 它仅适用于前一周的一周,因此它生成的输出仅适用于上周。现在我只需要以这样的方式修改该内部查询,以便我可以获得过去 6 周的数据,但每周分组。这意味着我想看到一个视图,其中我有 week_number
列从最里面的查询到顶部以及该周的所有数据,就像它在问题中的第一个查询中显示的那样。
将 O.week_number 替换为 MaxVersions.week_number,删除“和 MaxVersions.WeekNumber = O.week_number”
知道了,我们还需要在 group by 子句中将这个 O.week_number
更改得太正确吗?
由于某种原因,当我手动运行我的原始查询一周时,数据不匹配,然后如果我运行您的建议并尝试比较它,它不匹配。任何想法为什么会这样?您认为这与使用 join 而不是原始 IN 子句查询有关吗?【参考方案3】:
获取过去 6 周内所有不同的 entity_id、week_number 组合。(表 1)
SELECT DISTINCT entity_id, week_number
FROM abc.items
WHERE week_number >= Extract(week FROM CURRENT_DATE) - 6
AND week_number <= Extract(week FROM CURRENT_DATE) - 1
AND item_type LIKE '%Ionize - Data%'
获取clientId和Consumer ID的唯一组合对应的最大订单版本
SELECT clientid AS clientid, Max(version) AS version, consumerid
FROM xyz.orders
WHERE createdfor = 'BLOCK'
AND holder = 'RELAY_FUTURES'
GROUP BY consumerid, clientid
将上面两张表连接起来,得到clientId、version、Week Number。
select xo.clientid AS clientid,
xo.version AS version,
ai.week_number as week_number from (SELECT DISTINCT entity_id, week_number
FROM abc.items
WHERE week_number >= Extract(week FROM CURRENT_DATE) - 6
AND week_number <= Extract(week FROM CURRENT_DATE) - 1
AND item_type LIKE '%Ionize - Data%')
ai LEFT JOIN
(SELECT clientid AS clientid,
Max(version) AS version,
consumerid
FROM xyz.orders
WHERE createdfor = 'BLOCK'
AND holder = 'RELAY_FUTURES'
GROUP BY consumerid, clientid) xo
ON ai.entity_id = xo.consumerid
左连接用于丢弃表1中不存在的所有消费者ID。
使用上面的查询作为内部查询。
SELECT pb.week_number,
pa.type,
CASE
WHEN (type = 'PROC1'
AND contractdomicilecode = 'UIT') THEN 450
WHEN (type = 'PROC1'
AND contractdomicilecode = 'KJH') THEN 900
WHEN (type = 'PROC2'
AND contractdomicilecode = 'LOP') THEN 8840
WHEN (type = 'PROC2'
AND contractdomicilecode = 'AWE') THEN 1490
WHEN (type = 'PROC3'
AND contractdomicilecode = 'MNH') THEN 1600
WHEN (type = 'PROC3'
AND contractdomicilecode = 'LKP') THEN 1900
END AS amount,
pa.total
FROM xyz.orders pa
JOIN
(select xo.clientid AS clientid,
xo.version AS version,
ai.week_number as week_number from (SELECT DISTINCT entity_id, week_number
FROM abc.items
WHERE week_number >= Extract(week FROM CURRENT_DATE) - 6
AND week_number <= Extract(week FROM CURRENT_DATE) - 1
AND item_type LIKE '%Ionize - Data%')
ai LEFT JOIN
(SELECT clientid AS clientid,
Max(version) AS version,
consumerid
FROM xyz.orders
WHERE createdfor = 'BLOCK'
AND holder = 'RELAY_FUTURES'
GROUP BY consumerid, clientid) xo
ON ai.entity_id = xo.consumerid) pb
ON pa.clientid = pb.clientid
AND pa.version = pb.version;
【讨论】:
以上是关于子查询在获取过去 X 周的数据时出现太多列错误?的主要内容,如果未能解决你的问题,请参考以下文章