上一行结束日期作为 SQL 中的下一行开始日期
Posted
技术标签:
【中文标题】上一行结束日期作为 SQL 中的下一行开始日期【英文标题】:Previous row end date as the next row start date in SQL 【发布时间】:2016-09-30 14:42:42 【问题描述】:需要帮助,请,
我有一个名为“hist_lastupdated”的字段,其中包含产品价格修改的最后更新日期。
基于此字段,我想提取修改的开始日期和结束日期。
其实我有这个:
**Product_id , Price , hist_lastupdated**
284849 18.95 2015-05-29 00:53:55
284849 15.95 2015-08-14 01:04:46
284849 18.95 2016-06-11 00:50:31
284849 15.95 2016-08-24 00:45:11
我想得到这样的结果:
**Product_id , Price , hist_lastupdated ,start_date , End_date**
284849 18.95 2015-05-29 00:53:55 2014-05-01 00:00:00 2015-05-29 00:53:55
284849 15.95 2015-08-14 01:04:46 2015-05-29 00:53:55 2015-08-14 01:04:46
284849 18.95 2016-06-11 00:50:31 2015-08-14 01:04:46 2016-06-11 00:50:31
284849 15.95 2016-08-24 00:45:11 2016-06-11 00:50:31 2016-08-24 00:45:11
两个字,开始日期是上一行的结束日期 我有很多产品ID
【问题讨论】:
你想要 Postgres 中的 lag() 函数——不知道 Redshift 是否支持这个。您是否同时使用两个 DBMS? @horse_with_no_name 看起来 LAG 也在 Redshift 中:docs.aws.amazon.com/redshift/latest/dg/r_WF_LAG.html 有朋友告诉我可以用'With Function'发布 【参考方案1】:类似这样的:
select Product_id,
Price,
hist_lastupdated,
lag(hist_lastupdated) over (partition by product_id order by hist_lastupdated) as start_date,
hist_lastupdated as end_date
from the_table
您没有解释第一列的 start_date 是在哪里计算的。如果那是从 hist_lastupdated
开始的月初,您可以执行以下操作:
lag(hist_lastupdated, 1, date_trunc('month', hist_lastupdated)) over (...)
【讨论】:
对于第一列,我想静态制作 01-01-2014 一直占用前一个字段值甚至product_id不同的问题。 @a_horse_with_no_name【参考方案2】:我不确定你会如何只使用 SQL 来做到这一点,但如果你能够编写一些脚本,你可以编写一个类似这样的快速程序(伪代码):
lines = execute(SELECT product_id, price, hist_lastupdated FROM ProductTable)
startDate = 00:00:00 2014-05-01
outputLines = []
for row in lines:
outLine = []
outline.append(row[0])
outline.append(row[1])
outline.append(row[2])
outline.append(startDate)
outline.append(row[2])
startDate = row[2]
#Now do what you want with the output you have in a nice list of lists in the format you need, insert into a different table, write to a file, whatever you want.
【讨论】:
感谢您的回复,但我应该使用 Sql 来完成【参考方案3】:我会将这些解决方案之一与 MS SQL Server 一起使用。希望其中之一适用于您的问题。
纯 SQL 语句如下所示:
select
t.product_id, t.price, s.start_date, t.end_date
from
product t
outer apply
(
select top 1
end_date start_date
from
product o
where
o.end_date < t.end_date
order by
o.end_date desc
) s
即使有良好的索引,对返回的每条记录的交叉应用也可能是一个性能问题。
如果您的 SQL Server 支持 LAG 功能:
select
t.product_id, t.price,
LAG(T.end_date) over (order by t.end_date),
t.end_date
from
product t
或者您可能会找到一种方法来对更新语句中的变量执行相同的操作,以“记住”先前更新的记录中的值,例如 T-SQL:
-- Insert the desired output into a table variable that also has a start_date field.
-- Be sure to insert the records ordered by the date value.
declare @output table (product_id int, price numeric(10,2), [start_date] datetime, [end_date] datetime)
insert @output (product_id, price, end_date)
select 1, 10, '1/1/2015'
union all select 2, 11, '2/1/2015'
union all select 3, 15, '3/1/2015'
union all select 4, 20, '4/1/2015'
order by 3
-- Update the start date using the end date from the previous record
declare @start_date datetime, @end_date datetime
update
@output
set
@start_date = @end_date,
@end_date = end_date,
start_date = @start_date
select * from @output
我不认为 Microsoft 推荐这种技术,但它对我有很好的帮助并且始终如一地工作。我只对表变量使用了这种技术。我不太愿意相信实际表中记录的更新顺序。现在我会改用 LAG()。
【讨论】:
【参考方案4】:这是我找到的解决方案,我想使用滞后功能,但结果不是我想要的。
解决办法:
WITH
price_table_1 as (
select
-1 + ROW_NUMBER() over (partition by t1.product_id,t1.id ,t1.channel_id) as rownum_w1,
t1.id,
t1.product_id,
t1.channel_id,
t1.member_id,
t1.quantity,
t1.price,
t1.promo_dt_start,
t1.promo_dt_end,
t1.hist_lastupdated
FROM dwh_prod.hist_prices t1
where t1.channel_id='1004' and t1.product_id = '5896' and t1.quantity = '1' and t1.promo_dt_start is null
order by t1.product_id,t1.channel_id,t1.hist_lastupdated
),price_table_2 as (
select
ROW_NUMBER() over (partition by t2.product_id,t2.id ,t2.channel_id) as rownum_w2,
t2.id,
t2.product_id,
t2.channel_id,
t2.member_id,
t2.quantity,
t2.price,
t2.promo_dt_start,
t2.promo_dt_end,
t2.hist_lastupdated
FROM dwh_prod.hist_prices t2
where t2.channel_id='1004' and t2.product_id = '5896' and t2.quantity = '1' and t2.promo_dt_start is null
order by t2.product_id,t2.channel_id,t2.hist_lastupdated
)
select
t1.id,
t1.product_id,
t1.channel_id,
t1.member_id,
t1.quantity,
t1.price,
t1.promo_dt_start,
t1.promo_dt_end,
t2.hist_lastupdated as start_date,
t1.hist_lastupdated as end_date
FROM price_table_1 t1
inner join price_table_2 t2
on t2.product_id = t1.product_id and t2.id = t1.id and t2.channel_id = t1.channel_id
and rownum_w1 = (rownum_w2)
UNION ALL
select
t1.id,
t1.product_id,
t1.channel_id,
t1.member_id,
t1.quantity,
t1.price,
t1.promo_dt_start,
t1.promo_dt_end,
CONVERT(TIMESTAMP,'2014-01-01') as start_date,
t1.hist_lastupdated as end_date
FROM price_table_1 t1
where rownum_w1 = '0';
【讨论】:
以上是关于上一行结束日期作为 SQL 中的下一行开始日期的主要内容,如果未能解决你的问题,请参考以下文章
Spark Window Functions:过滤掉开始和结束日期在另一行开始和结束日期范围内的行
如何在不改变位置的情况下调整同一行中的两个 React 日期选择器
查找在 SQL Server 2014 中拥有多个帐户的客户的真正开始结束日期
用于删除重复(连续)记录的 SQL,但将最小日期存储在开始日期和最大日期作为结束日期