使用 HQL 计算两个字段的差异之间的范围
Posted
技术标签:
【中文标题】使用 HQL 计算两个字段的差异之间的范围【英文标题】:calculate range between the difference of two fields with HQL 【发布时间】:2019-09-05 09:41:24 【问题描述】:我需要你帮忙做一张桌子。我在 Hive 中使用 hql 查询来加载表。有人知道加载表格吗?
表用户
START_TIME_DATE | END_TIME_DATE | USER | START_DAY_ID | END_DAY_ID
(String) (String) (Bigint) (Int) (Int)
210241 231236 1 01092019 01092019
234736 235251 2 01092019 01092019
223408 021345 3 01092019 02092019
START_TIME_DATE、END__TIME_DATE 字段指示用户在该地点的时间。这个想法是在不同的行中显示用户的每个小时,只有“小时”字段中的前两个数字。
表用户小时
DATE | HOUR | ID
(Bigint) (String) (Bigint)
01092019 21 1
01092019 22 1
01092019 23 1
01092019 23 2
01092019 22 3
01092019 23 3
02092019 00 3
02092019 01 3
02092019 02 3
目前我的查询是这样的,但它不起作用。我正在尝试“联合所有”
insert overwrite table USERHOUR
(select [start_time_date] ,[end_time_date]
from user
union all
select [start_time_date]+1,[end_time_date]
where [start_time_date]+1<=[end_time_date]
)
as hour) --generate a range between start_time_date and end_time_date and before cast to Hours,
end_day_id a date,
user as id
from table USER;
【问题讨论】:
01092019
是什么日期格式?您应该说明您使用的数据类型。
【参考方案1】:
为此,我计算了小时差,使用poseexplode(space(hours))生成行,计算了开始时间戳+(从爆炸的位置)*3600,并从结果时间戳中提取小时和日期。
使用您的示例查看此演示:
with mydata as(--this is your data
select stack(3,
'210241', '231236', 1, '01092019', '01092019',
'234736', '235251', 2, '01092019', '01092019',
'223408', '021345', 3, '01092019', '02092019'
) as (START_TIME_DATE,END_TIME_DATE,USER,START_DAY_ID,END_DAY_ID))
select --extract date, hour from timestamp calculated
--this can be done in previous step
--this subquery is to make code cleaner
date_format(dtm, 'ddMMyyyy') as DATE,
date_format(dtm, 'HH') as HOUR,
user as ID
from
(
select user,
start, h.i, hours, --these columns are for debugging
from_unixtime(start+h.i*3600) dtm --add hour (in seconds) to the start unix timestamp
--and convert to timestamp
from
(
select user,
--start timestamp (unix timestamp in seconds)
unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH') as start,
floor((unix_timestamp(concat(END_DAY_ID, ' ', substr(END_TIME_DATE,1,2)),'ddMMyyyy HH')-
unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH')
)/ --diff in seconds
3600) as hours --diff in hours
from mydata
)s
lateral view posexplode(split(space(cast(s.hours as int)),' ')) h as i,x --this will generate rows
)s
;
结果:
OK
01092019 21 1
01092019 22 1
01092019 23 1
01092019 23 2
01092019 22 3
01092019 23 3
02092019 00 3
02092019 01 3
02092019 02 3
Time taken: 3.207 seconds, Fetched: 9 row(s)
【讨论】:
以上是关于使用 HQL 计算两个字段的差异之间的范围的主要内容,如果未能解决你的问题,请参考以下文章