计算 Hive 中不包括星期日的天数
Posted
技术标签:
【中文标题】计算 Hive 中不包括星期日的天数【英文标题】:Calculate number of days excluding sunday in Hive 【发布时间】:2016-07-15 07:21:46 【问题描述】:我有两个时间戳作为输入。我想计算这些时间戳之间的时差(不包括星期日)。
我可以使用 hive 中的 datediff 函数获取天数。
我可以使用 from_unixtime(unix_timestamp(startdate), 'EEEE') 获取特定日期的日期。
但我不知道如何关联这些功能以实现我的要求,或者有没有其他简单的方法来实现这一点。
提前致谢。
【问题讨论】:
除了星期日之外是什么意思。 【参考方案1】:您可以编写一个自定义 UDF,它将包含日期的两列作为输入,并计算除星期日之外的日期之间的差异。
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.List;
import java.util.Date;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
public class IsoYearWeek extends UDF
public LongWritable evaluate(Text dateString,Text dateString1) throws ParseException //takes the two columns as inputs
SimpleDateFormat date = new SimpleDateFormat("dd/MM/yyyy");
/* String date1 = "20/07/2016";
String date2 = "28/07/2016";
*/ int count=0;
List<Date> dates = new ArrayList<Date>();
Date startDate = (Date)date.parse(dateString.toString());
Date endDate = (Date)date.parse(dateString1.toString());
long interval = 24*1000 * 60 * 60; // 1 hour in millis
long endTime =endDate.getTime() ; // create your endtime here, possibly using Calendar or Date
long curTime = startDate.getTime();
while (curTime <= endTime)
dates.add(new Date(curTime));
curTime += interval;
for(int i=0;i<dates.size();i++)
Date lDate =(Date)dates.get(i);
if(lDate.getDay()==0)
count+=1; //counts the number of sundays in between
long days_diff = (endDate.getTime()-startDate.getTime())/(24 * 60 * 60 * 1000)-count; //displays the days difference excluding sundays
return new LongWritable(days_diff);
【讨论】:
我投票支持这个解决方案。也许应该改进代码,但我不知道如果没有 UDF 有效地完成您的任务。也可以使用 shell 脚本或 python。也可以使用date_dim
表加入您的数据以排除星期日,然后使用 sum(days) 但在这种情况下我更喜欢 UDF。
如何在 hive 查询中使用这个?我有相同类型的要求。【参考方案2】:
使用 spark 更容易实现和维护
import org.joda.time.format.DateTimeFormat
def dayDiffWithExcludeWeekendAndHoliday(startDate:String,endDate:String,holidayExclusion:Seq[String]) =
@transient val datePattern="yyyy-MM-dd"
@transient val dateformatter=DateTimeFormat.forPattern(datePattern)
var numWeekDaysValid=0
var numWeekends=0
var numWeekDaysInValid=0
val holidayExclusionJoda=holidayExclusion.map(dateformatter.parseDateTime(_))
val startDateJoda=dateformatter.parseDateTime(startDate)
var startDateJodaLatest=dateformatter.parseDateTime(startDate)
val endDateJoda=dateformatter.parseDateTime(endDate)
while (startDateJodaLatest.compareTo(endDateJoda) !=0)
startDateJodaLatest.getDayOfWeek match
case value if value >5 => numWeekends=numWeekends+1
case value if value <= 5 => holidayExclusionJoda.contains(startDateJodaLatest) match case value if value == true => numWeekDaysInValid=numWeekDaysInValid+1 case value if value == false => numWeekDaysValid=numWeekDaysValid+1
startDateJodaLatest = startDateJodaLatest.plusDays(1)
Array(numWeekDaysValid,numWeekends,numWeekDaysInValid)
spark.udf.register("dayDiffWithExcludeWeekendAndHoliday",dayDiffWithExcludeWeekendAndHoliday(_:String,_:String,_:Seq[String]))
case class tmpDateInfo(startDate:String,endDate:String,holidayExclusion:Array[String])
case class tmpDateInfoFull(startDate:String,endDate:String,holidayExclusion:Array[String],numWeekDaysValid:Int,numWeekends:Int,numWeekDaysInValid:Int)
def dayDiffWithExcludeWeekendAndHolidayCase(tmpInfo:tmpDateInfo) =
@transient val datePattern="yyyy-MM-dd"
@transient val dateformatter=DateTimeFormat.forPattern(datePattern)
var numWeekDaysValid=0
var numWeekends=0
var numWeekDaysInValid=0
val holidayExclusionJoda=tmpInfo.holidayExclusion.map(dateformatter.parseDateTime(_))
val startDateJoda=dateformatter.parseDateTime(tmpInfo.startDate)
var startDateJodaLatest=dateformatter.parseDateTime(tmpInfo.startDate)
val endDateJoda=dateformatter.parseDateTime(tmpInfo.endDate)
while (startDateJodaLatest.compareTo(endDateJoda) !=0)
startDateJodaLatest.getDayOfWeek match
case value if value >5 => numWeekends=numWeekends+1
case value if value <= 5 => holidayExclusionJoda.contains(startDateJodaLatest) match case value if value == true => numWeekDaysInValid=numWeekDaysInValid+1 case value if value == false => numWeekDaysValid=numWeekDaysValid+1
startDateJodaLatest = startDateJodaLatest.plusDays(1)
tmpDateInfoFull(tmpInfo.startDate,tmpInfo.endDate,tmpInfo.holidayExclusion,numWeekDaysValid,numWeekends,numWeekDaysInValid)
//df way 1
val tmpDF=Seq(("2020-05-03","2020-06-08",List("2020-05-08","2020-06-05"))).toDF("startDate","endDate","holidayExclusion").select(col("startDate").cast(StringType),col("endDate").cast(StringType),col("holidayExclusion"))
tmpDF.as[tmpDateInfo].map(dayDiffWithExcludeWeekendAndHolidayCase).show(false)
//df way 2
tmpDF.selectExpr("*","dayDiffWithExcludeWeekendAndHoliday(cast(startDate as string),cast(endDate as string),cast(holidayExclusion as array<string>)) as resultDays").selectExpr("startDate","endDate","holidayExclusion","resultDays[0] as numWeekDaysValid","resultDays[1] as numWeekends","resultDays[2] as numWeekDaysInValid").show(false)
tmpDF.selectExpr("*","dayDiffWithExcludeWeekendAndHoliday(cast(startDate as string),cast(endDate as string),cast(holidayExclusion as array<string>)) as resultDays").selectExpr("startDate","endDate","holidayExclusion","resultDays[0] as numWeekDaysValid","resultDays[1] as numWeekends","resultDays[2] as numWeekDaysInValid").show(false)
// spark sql way, works with hive table when configured in hive metastore
tmpDF.createOrReplaceTempView("tmpTable")
spark.sql("select startDate,endDate,holidayExclusion,dayDiffWithExcludeWeekendAndHoliday(startDate,endDate,holidayExclusion) from tmpTable").show(false)
【讨论】:
以上是关于计算 Hive 中不包括星期日的天数的主要内容,如果未能解决你的问题,请参考以下文章
给定日期范围(开始日期和结束日期),我如何计算天数,不包括 .Net 中指定的星期几?
Excel | EDATE函数计算合同到期日,DATEDIF计算距离到期日的天数,并设置“交通三色灯”提醒
Excel079 | EDATE函数计算合同到期日,DATEDIF计算距离到期日的天数,并设置“交通三色灯”提醒