JavaScript 中的雪花 UDF 未按预期计算

Posted

技术标签:

【中文标题】JavaScript 中的雪花 UDF 未按预期计算【英文标题】:Snowflake UDF in JavaScript does not calculate as expected 【发布时间】:2021-04-26 11:44:05 【问题描述】:

我正在尝试计算已记录的作业已运行的分钟数。 每个作业都有开始时间和结束时间。

在这种特殊情况下,工作时间在 01:00 到 10:00 之间,并且只有工作日(周末除外)

为了计算这个,我尝试制作了一个基于 javascript 的 UDF,如下所示:

CREATE OR REPLACE FUNCTION JobRuns(f datetime, t datetime)
RETURNS DOUBLE
LANGUAGE JAVASCRIPT
AS
$$
    // Based on the Calculation of Business Hours in JavaScript
    // https://www.c-sharpcorner.com/UploadFile/36985e/calculating-business-hours-in-javascript/
    
    function workingMinutesBetweenDates(startDate, endDate)     
        // Store minutes worked    
        var minutesWorked = 0;    
    
        // Validate input    
        if (endDate < startDate)     
            return 0;    
            
        
        // Loop from your Start to End dates (by hour)    
        var current = startDate;    
        
        // Define work range    
        var workHoursStart = 1;    
        var workHoursEnd = 10;    
        var includeWeekends = false;    
        
        // Loop while currentDate is less than end Date (by minutes)    
        while (current <= endDate)     
            // Is the current time within a work day (and if it occurs on a weekend or not)     
            if (current.getHours() >= workHoursStart && current.getHours() <= workHoursEnd && (includeWeekends ? current.getDay() !== 0 && current.getDay() !== 6 : true))     
                minutesWorked++;    
                
        
            // Increment current time    
            current.setTime(current.getTime() + 1000 * 60);    
            
        
        // Return the number of minutes
        return minutesWorked;    
       
    return workingMinutesBetweenDates(F,T);
$$
;

但在某些情况下,我得到的结果与我的预期相差甚远。

JS逻辑从这里抓取; https://www.c-sharpcorner.com/UploadFile/36985e/calculating-business-hours-in-javascript/ 并且当我查看代码时,我看不到任何可能导致这些差异的缺陷。

我正在使用这些测试数据

CREATE OR REPLACE TABLE "SLA_Test" (
    "DocumentID" VARCHAR(16777216),
    "From" TIMESTAMP_NTZ(9),
    "To" TIMESTAMP_NTZ(9),
    "ExpectedTime" INT
  );
  
  INSERT INTO "SLA_Test"
  VALUES
    ('ACD7EFC1-8D17-46E3-84DB-C08067466866','2021-03-03 07:12:34.567','2021-03-03 08:12:34.567',60),
    ('C41FB599-D1EC-4461-BBAF-1AFF67D2F3C2','2021-03-03 09:55:00.000','2021-03-04 01:05:00.000',10),
    ('B741C663-732B-4FD3-839D-E70330C58990','2021-03-03 09:55:00.000','2021-03-04 00:05:00.000',5),
    ('C5893C51-F5CE-40E4-85F7-775515BC3E3D','2021-03-03 19:55:00.000','2021-03-04 01:05:00.000',5),
    ('BAF4ED57-8184-4CDF-8875-DFDA6EAC2033','2021-03-03 09:55:00.000','2021-03-05 01:05:00.000',550),
    ('F325059E-E78F-4DCE-B675-CC1C59669B3C','2021-03-05 09:55:00.000','2021-03-08 01:05:00.000',10),
    ('F325059E-E78F-4DCE-B675-CC1C59669B3C','2021-03-05 09:55:00.000','2021-03-07 01:05:00.000',5);

SELECT "DocumentID","From","To",
    DATEDIFF(second, "From", "To") AS "TotalElapsedTimeSecond",
    DATEDIFF(second, "From", "To")/60 AS "TotalElapsedTimeMinut",
    "ExpectedTime",
    JobRuns("From","To") AS "ElapsedTimeMinut"
FROM "SLA_Test";

任何想法为什么 UDF 不返回预期时间?

【问题讨论】:

所有 ExpectedTime 值都以分钟为单位吗?你能解释一下为什么 '2021-03-03 09:55:00' 和 '2021-03-04 01:05:00' 之间的 ExpectedTime 是 10 分钟(或者,事实上,解释一下任何 ExpectedTime 值背后的计算方法吗?你已经给了 - 除了第一个)? 工作时间在 01 和 10 之间。所以 09:55 将给您当天 5 分钟的工作时间。 01:05 是第二天,在这种情况下,还有 5 分钟的工作时间 = 总共 10 分钟。希望能解决这个问题 【参考方案1】:

如果您创建工作时间表,则可以运行以下查询:

select
  t.id
  , sum(datediff(‘second’, 
               -- calculate the max of the two start time
               (case when t.start <= 
                          w.working_day_start_timestamp
                     then w.working_day_start_timestamp
                     else t.start
                end),
               -- calculate the min of the two end times
               (case when t.end >= 
                          w.working_day_end_timestamp
                     then w.working_day_end_timestamp
                     else t.end 
                end)
               )) / 3600 -- convert to hourly
  as working_hour_diff
from 
  working_days_times w,
  cross join time_intervals t
where -- select all intersecting intervals
  (
   t.start <= w.working_day_end_timestamp
   and
   t.end >= w.working_day_start_timestamp
  )
and -- select only working days
  w.is_working_day
group by
  t.id

本文还详细介绍了如何将其实现为 Javascript UDF:https://medium.com/dandy-engineering-blog/how-to-calculate-the-number-of-working-hours-between-two-timestamps-in-sql-b5696de66e51

【讨论】:

【参考方案2】:

这一切都可以在 SQL 中完成,

with SLA_Test(DocumentID, FromTime, ToTime, ExpectedTime) AS (
  SELECT column1, column2::timestamp_ntz, column3::timestamp_ntz, column4 
  FROM
  VALUES
    ('ACD7EFC1-8D17-46E3-84DB-C08067466866','2021-03-03 07:12:34.567','2021-03-03 08:12:34.567',60),
    ('C41FB599-D1EC-4461-BBAF-1AFF67D2F3C2','2021-03-03 09:55:00.000','2021-03-04 01:05:00.000',10),
    ('B741C663-732B-4FD3-839D-E70330C58990','2021-03-03 09:55:00.000','2021-03-04 00:05:00.000',5),
    ('C5893C51-F5CE-40E4-85F7-775515BC3E3D','2021-03-03 19:55:00.000','2021-03-04 01:05:00.000',5),
    ('BAF4ED57-8184-4CDF-8875-DFDA6EAC2033','2021-03-03 09:55:00.000','2021-03-05 01:05:00.000',550),
    ('F325059E-E78F-4DCE-B675-CC1C59669B3C','2021-03-05 09:55:00.000','2021-03-08 01:05:00.000',10),
    ('F325059E-E78F-4DCE-B675-CC1C59669B3C','2021-03-05 09:55:00.000','2021-03-07 01:05:00.000',5)
), days as (
    SELECT row_number() over(order by seq8())-1 as num
    FROM table(GENERATOR(rowcount=>30))
), enriched as (
    SELECT *,
        datediff('day', s.fromtime, s.totime) as tot_days
  from SLA_Test AS s
), day_sliced AS (
  select s.*
      ,d.*
      ,date_trunc('day',fromtime) f_s
      ,dateadd('day', d.num, f_s) as clip_day
      ,dateadd('hour', 1, clip_day) as clip_start
      ,dateadd('hour', 10, clip_day) as clip_end
      ,dayofweekiso(clip_day) as dowi
      ,dowi >=1 AND dowi <= 5 as work_day
      ,least(greatest(s.fromtime, clip_start),clip_end) as slice_start
      ,greatest(least(s.totime, clip_end), clip_start) as slice_end
      ,DATEDIFF('second', slice_start, slice_end) as slice_sec
      ,DATEDIFF('minute', slice_start, slice_end) as slice_min
  from enriched AS s
  join days AS d on d.num <= s.tot_days
  qualify work_day = true
)
SELECT 
    DocumentID
    ,FromTime
    ,ToTime
    ,ExpectedTime
    ,round(sum(slice_sec)/60,0) as elasped_time_minutes
FROM day_sliced
GROUP BY 1,2,3,4
ORDER BY 1,2;

它给出了预期的结果:

DOCUMENTID                              FROMTIME                TOTIME     EXPECTEDTIME ELASPED_TIME_MINUTES
ACD7EFC1-8D17-46E3-84DB-C08067466866    2021-03-03 07:12:34.567 2021-03-03 08:12:34.567 60  60
B741C663-732B-4FD3-839D-E70330C58990    2021-03-03 09:55:00.000 2021-03-04 00:05:00.000 5   5
BAF4ED57-8184-4CDF-8875-DFDA6EAC2033    2021-03-03 09:55:00.000 2021-03-05 01:05:00.000 550 550
C41FB599-D1EC-4461-BBAF-1AFF67D2F3C2    2021-03-03 09:55:00.000 2021-03-04 01:05:00.000 10  10
C5893C51-F5CE-40E4-85F7-775515BC3E3D    2021-03-03 19:55:00.000 2021-03-04 01:05:00.000 5   5
F325059E-E78F-4DCE-B675-CC1C59669B3C    2021-03-05 09:55:00.000 2021-03-07 01:05:00.000 5   5
F325059E-E78F-4DCE-B675-CC1C59669B3C    2021-03-05 09:55:00.000 2021-03-08 01:05:00.000 10  10

【讨论】:

感谢 @simeon-pilgrim,这太棒了,让我更深入地了解 CTE 的力量【参考方案3】:

您是否在 Snowflake 之外对此进行了测试?我刚刚创建了以下文件并运行 node /tmp/dates.js 会产生与 Snowflake 匹配的输出

// Col1: function return, Col2: Expected
61 60
71 10
65 5
6 5
671 550
1271 10
671 5
function workingMinutesBetweenDates(startDate, endDate)     
    // Store minutes worked    
    var minutesWorked = 0;    

    // Validate input    
    if (endDate < startDate)     
        return 0;    
        
    
    // Loop from your Start to End dates (by hour)    
    var current = startDate;    
    
    // Define work range    
    var workHoursStart = 1;    
    var workHoursEnd = 10;    
    var includeWeekends = false;    
    
    // Loop while currentDate is less than end Date (by minutes)    
    while (current <= endDate)     
        // Is the current time within a work day (and if it occurs on a weekend or not)     
        if (current.getHours() >= workHoursStart && current.getHours() <= workHoursEnd && (includeWeekends ? current.getDay() !== 0 && current.getDay() !== 6 : true))     
            minutesWorked++;    
            
    
        // Increment current time    
        current.setTime(current.getTime() + 1000 * 60);    
        
    
    // Return the number of minutes
    return minutesWorked;    
 


console.log(workingMinutesBetweenDates((new Date(Date.parse('2021-03-03 07:12:34.567'))), (new Date(Date.parse('2021-03-03 08:12:34.567')))), 60);
console.log(workingMinutesBetweenDates((new Date(Date.parse('2021-03-03 09:55:00.000'))), (new Date(Date.parse('2021-03-04 01:05:00.000')))), 10);
console.log(workingMinutesBetweenDates((new Date(Date.parse('2021-03-03 09:55:00.000'))), (new Date(Date.parse('2021-03-04 00:05:00.000')))), 5);
console.log(workingMinutesBetweenDates((new Date(Date.parse('2021-03-03 19:55:00.000'))), (new Date(Date.parse('2021-03-04 01:05:00.000')))), 5);
console.log(workingMinutesBetweenDates((new Date(Date.parse('2021-03-03 09:55:00.000'))), (new Date(Date.parse('2021-03-05 01:05:00.000')))), 550);
console.log(workingMinutesBetweenDates((new Date(Date.parse('2021-03-05 09:55:00.000'))), (new Date(Date.parse('2021-03-08 01:05:00.000')))), 10);
console.log(workingMinutesBetweenDates((new Date(Date.parse('2021-03-05 09:55:00.000'))), (new Date(Date.parse('2021-03-07 01:05:00.000')))), 5);

【讨论】:

感谢您的意见,我最好开始使用 node.js,这样我就可以在本地调试 javascript 如果您不想打扰节点,也可以在浏览器中使用 Web Inspector 控制台直接测试 Javascript。【参考方案4】:

我认为代码至少有 2 个问题:

    它总是会多计至少 1。在 WHILE 语句的第一个循环中,minutesWorked 递增,但此时实际上没有工作时间 - 直到 StartDate + 1 分钟才工作第一分钟 您的工作日在 10 点结束,但您的逻辑包括小时部分

【讨论】:

你是对的@nickW,正如上面提到的 Nat Taylor,JavaScript 没有按我的预期计算。

以上是关于JavaScript 中的雪花 UDF 未按预期计算的主要内容,如果未能解决你的问题,请参考以下文章

雪花中的 UDF JavaScript 实现

使用广播应用地图转换时,pyspark Udf 未按预期工作?

Hive Generic UDF:Hive 未按预期进行转换,原因是:java.lang.ClassCastException:java.util.ArrayList 无法转换为 java.util.

iOS - Javascript 未按预期工作

Javascript:Jquery 发布请求未按预期工作

HTML5/JavaScript/PhoneGap onclick 未按预期运行