HiveSql/SparkSQL常用函数
Posted farAway
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HiveSql/SparkSQL常用函数相关的知识,希望对你有一定的参考价值。
一、获取当前时间
- current_date获取当前日期
2018-04-09 - current_timestamp/now()获取当前时间
2018-04-09 15:20:49.247
二、从日期时间中提取字段
- year,month,day/dayofmonth,hour,minute,second
Examples:
> SELECT day(\'2009-07-30\');
30`
* 1
* 2
- dayofweek (1 = Sunday, 2 = Monday, …, 7 = Saturday),dayofyear
Examples:
> SELECT dayofweek(\'2009-07-30\');
5`
* 1
* 2
- weekofyear
weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.
Examples:
> SELECT weekofyear(\'2008-02-20\');
8`
* 1
* 2
- trunc截取某部分的日期,其他部分默认为01
第二个参数 [“year”, “yyyy”, “yy”, “mon”, “month”, “mm”]
Examples:
> SELECT trunc(\'2009-02-12\', \'MM\');
2009-02-01
> SELECT trunc(\'2015-10-27\', \'YEAR\');
2015-01-01`
* 1
* 2
* 3
* 4
- date_trunc [“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]
Examples:
> SELECT date_trunc(\'2015-03-05T09:32:05.359\', \'HOUR\');
2015-03-05T09:00:00`
* 1
* 2
- date_format将时间转化为某种格式的字符串
Examples:
> SELECT date_format(\'2016-04-08\', \'y\');
2016`
* 1
* 2
三、日期时间转换
- unix_timestamp返回当前时间的unix时间戳
Examples:
> SELECT unix_timestamp(); 1476884637
> SELECT unix_timestamp(\'2016-04-08\', \'yyyy-MM-dd\'); 1460041200`
* 1
* 2
- from_unixtime将时间戳换算成当前时间,to_unix_timestamp将时间转化为时间戳
Examples:
> SELECT from_unixtime(0, \'yyyy-MM-dd HH:mm:ss\'); 1970-01-01 00:00:00
> SELECT to_unix_timestamp(\'2016-04-08\', \'yyyy-MM-dd\'); 1460041200`
* 1
* 2
- to_date/date将字符串转化为日期格式,to_timestamp(Since: 2.2.0)
> SELECT to_date(\'2009-07-30 04:17:52\'); 2009-07-30
> SELECT to_date(\'2016-12-31\', \'yyyy-MM-dd\'); 2016-12-31
> SELECT to_timestamp(\'2016-12-31 00:12:00\'); 2016-12-31 00:12:00`
* 1
* 2
* 3
- quarter 将1年4等分(range 1 to 4)
Examples:
> SELECT quarter(\'2016-08-31\');
3`
* 1
* 2
四、日期、时间计算
- months_between两个日期之间的月数
months_between(timestamp1, timestamp2) - Returns number of months between timestamp1 and timestamp2.
Examples:
> SELECT months_between(\'1997-02-28 10:30:00\', \'1996-10-30\');
3.94959677`
* 1
* 2
- add_months返回日期后n个月后的日期
Examples:
> SELECT add_months(\'2016-08-31\', 1);
2016-09-30`
* 1
* 2
* 3
- last_day(date),next_day(start_date, day_of_week)
Examples:
> SELECT last_day(\'2009-01-12\'); 2009-01-31
> SELECT next_day(\'2015-01-14\', \'TU\'); 2015-01-20`
* 1
* 2
- date_add,date_sub(减)
date_add(start_date, num_days) - Returns the date that is num_days after start_date.
Examples:
> SELECT date_add(\'2016-07-30\', 1); 2016-07-31`
* 1
- datediff(两个日期间的天数)
datediff(endDate, startDate) - Returns the number of days from startDate to endDate.
Examples:
> SELECT datediff(\'2009-07-31\', \'2009-07-30\');
1`
* 1
* 2
- 关于UTC时间
- to_utc_timestamp
to_utc_timestamp(timestamp, timezone) - Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, ‘GMT+1’ would yield ‘2017-07-14 01:40:00.0’.
Examples:
> SELECT to_utc_timestamp(\'2016-08-31\', \'Asia/Seoul\');
2016-08-30 15:00:00`
* 1
* 2
- from_utc_timestamp
from_utc_timestamp(timestamp, timezone) - Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, ‘GMT+1’ would yield ‘2017-07-14 03:40:00.0’.
Examples:
> SELECT from_utc_timestamp(\'2016-08-31\', \'Asia/Seoul\');
2016-08-31 09:00:00
五、Hive常用函数
1.数学函数
round(DOUBLE d) : 返回DOUBLE型的d的BIGINT类型的近似值
round(DOUBLE d,INT) : 返回DOUBLE型的d的保留n位小数的DOUBLE类型的近似值
floor(DOUBLE d): d是DOUBLE类型的,返回<=d的最大的BIGINT值
ceil(DOUBLE d): d是DOUBLE类型的,返回>=d的最小的BIGINT值
rand() rand(INT seed): 每行返回一个DOUBLE型的随机数,整数seed是随机因子
exp(DOUBLE d):返回e的d幂次方
ln(DOUBLE d): 以自然数为底d的对数
log10(DOUBLE d): 以10为底的d的对数
log2(DOUBLE d): 以2为底的d的对数
log(DOUBLE base,DOUBLE d)以base为底的d的对数
pow(DOUBLE d,DOUBLE p) power(DOUBLE d,DOUBLE p): 计算d的p次幂
sqrt(DOUBLE d): d的平方根
bin(DOUBLE i): 计算二进制值i的STRING类型值
hex(BIGINT i): 计算十六进制值i的STRING类型值
unhex(STRING i): hex的逆方法
conv(STRING num,INT from_base,INT to_base): 将STRING类型的num 从from_base进制转化为to_base 进制
abs(DOUBLE d): 计算d的绝对值
pmod(INT i1 ,INT i2): i1对i2取模
sin(DOUBLE d): 返回d的正弦值
cos(DOUBLE d): 返回d的余弦值
asin(DOUBLE d): 返回d的反正弦值
acos(DOUBLE d): 返回d的反余弦值
tan(DOUBLE d): 返回d的正切值
atan(DOUBLE d): 返回d的反正切值
degrees(DOUBLE d): 将弧度值d转化为角度值
radians(DOUBLE d): 将角度值d转化为弧度值
positive(DOUBLE d): 返回+d
negative(DOUBLE d): 返回-d
sign(DOUBLE d): 如果d是正数,则返回+1.0,如果d是负数,则返回-1.0,否则为0
2.日期函数
to_date(string timestamp):返回时间字符串中的日期部分,如to_date(\'1970-01-01 00:00:00\')=\'1970-01-01\'
current_date:返回当前日期
year(date):返回日期date的年,类型为int如year(\'2019-01-01\')=2019
month(date):返回日期date的月,类型为int,如month(\'2019-01-01\')=1
day(date): 返回日期date的天,类型为int,如day(\'2019-01-01\')=1
weekofyear(date1):返回日期date1位于该年第几周。如weekofyear(\'2019-03-06\')=10
datediff(date1,date2):返回日期date1与date2相差的天数,如datediff(\'2019-03-06\',\'2019-03-05\')=1
date_add(date1,int1):返回日期date1加上int1的日期,如date_add(\'2019-03-06\',1)=\'2019-03-07\'
date_sub(date1,int1):返回日期date1减去int1的日期,如date_sub(\'2019-03-06\',1)=\'2019-03-05\'
months_between(date1,date2):返回date1与date2相差月份,如months_between(\'2019-03-06\',\'2019-01-01\')=2
add_months(date1,int1):返回date1加上int1个月的日期,int1可为负数。如add_months(\'2019-02-11\',-1)=\'2019-01-11\'
last_day(date1):返回date1所在月份最后一天。如last_day(\'2019-02-01\')=\'2019-02-28\'
next_day(date1,day1):返回日期date1的下个星期day1的日期。day1为星期X的英文前两字母如next_day(\'2019-03-06\',\'MO\') 返回\'2019-03-11\'
trunc(date1,string1):返回日期最开始年份或月份。string1可为年(YYYY/YY/YEAR)或月(MONTH/MON/MM)。如trunc(\'2019-03-06\',\'MM\')=\'2019-03-01\',trunc(\'2019-03-06\',\'YYYY\')=\'2019-01-01\'
unix_timestamp():返回当前时间的unix时间戳,可指定日期格式。如unix_timestamp(\'2019-03-06\',\'yyyy-mm-dd\')=1546704180
from_unixtime():返回unix时间戳的日期,可指定格式。如select from_unixtime(unix_timestamp(\'2019-03-06\',\'yyyy-mm-dd\'),\'yyyymmdd\')=\'20190306\'
3.条件函数
if(boolean,t1,t2):若布尔值成立,则返回t1,反正返回t2。如if(1>2,100,200)返回200
case when boolean then t1 else t2 end:若布尔值成立,则t1,否则t2,可加多重判断
coalesce(v0,v1,v2):返回参数中的第一个非空值,若所有值均为null,则返回null。如coalesce(null,1,2)返回1
isnull(a):若a为null则返回true,否则返回false
4.字符串函数
length(string1):返回字符串长度
concat(string1,string2):返回拼接string1及string2后的字符串
concat_ws(sep,string1,string2):返回按指定分隔符拼接的字符串
lower(string1):返回小写字符串,同lcase(string1)。upper()/ucase():返回大写字符串
trim(string1):去字符串左右空格,ltrim(string1):去字符串左空格。rtrim(string1):去字符串右空格
repeat(string1,int1):返回重复string1字符串int1次后的字符串
reverse(string1):返回string1反转后的字符串。如reverse(\'abc\')返回\'cba\'
rpad(string1,len1,pad1):以pad1字符右填充string1字符串,至len1长度。如rpad(\'abc\',5,\'1\')返回\'abc11\'。lpad():左填充
split(string1,pat1):以pat1正则分隔字符串string1,返回数组。如split(\'a,b,c\',\',\')返回["a","b","c"]
substr(string1,index1,int1):以index位置起截取int1个字符。如substr(\'abcde\',1,2)返回\'ab\'
5.聚合函数
count():统计行数
sum(col1):统计指定列和
avg(col1):统计指定列平均值
min(col1):返回指定列最小值
max(col1):返回指定列最大值
6.表生成函数
explode (array):返回多行array中对应的元素。如explode(array(\'A\',\'B\',\'C\'))返回
explode(map):返回多行map键值对对应元素。如explode(map(1,\'A\',2,\'B\',3,\'C\'))返回
7.窗口函数
row_number() over(partitiion by .. order by .. ):根据partition排序,相同值取不同序号,不存在序号跳跃
rank() over(partition by .. order by .):根据partition排序,相同值取相同序号,存在序号跳跃
dense_rank() over(partition by .. order by ..):根据partition排序,相同值取相同序号,不存在序号跳跃
sum() over(partition by .. order by ..)
count() over(partition by .. order by ..)
lag(col,n) over(partition by .. order by ..) :查看当前行的上第n行
lead(col,n) over(partition by .. order by ..):查看当前行的下第n行
first_value() over(partition by .. order by ..):满足partition及排序的第一个值
last_value() over(partition by .. order by ..):满足partition及排序的最后值
ntile(n) over(partition by .. order by ..):满足partition及排序的数据分成n份
partition内更细的划分,可使用windows子句。常见子句为:
preceding:往前
following:往后
current row:当前行
unbounded:起点,unbounded preceding 表示从前面的起点, unbounded following:表示到后面的终点
使用如:、
sum(col) over(partition by .. order by .. rows between 1 preceding and current row):当前行与前一行做聚合
8.行列转换
concat_ws(sep, collect_set(col1)) :同组不同行合并成一列,以sep分隔符分隔。collect_set在无重复的情况下也可以collect_list()代替。collect_set()去重,collect_list()不去重
lateral view explode(split(col1,\',\')) :同组同列的数据拆分成多行,以sep分隔符区分
以上是关于HiveSql/SparkSQL常用函数的主要内容,如果未能解决你的问题,请参考以下文章