Flink - allowedLateness

Posted fxjwind

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Flink - allowedLateness相关的知识,希望对你有一定的参考价值。

WindowOperator

processElement

final Collection<W> elementWindows = windowAssigner.assignWindows(   //找出该element被assign的所有windows
    element.getValue(), element.getTimestamp(), windowAssignerContext);

//if element is handled by none of assigned elementWindows
boolean isSkippedElement = true;  //element默认是会skiped


for (W window: elementWindows) {

    // drop if the window is already late
    if (isWindowLate(window)) { //如果window是late,逻辑是window.maxTimestamp() + allowedLateness <= internalTimerService.currentWatermark(),continue表示skip
        continue;
    }
    isSkippedElement = false; //只要有一个窗口非late,该element就是非late数据

    windowState.setCurrentNamespace(window); 
    windowState.add(element.getValue()); //把数据加到windowState中

    triggerContext.key = key;
    triggerContext.window = window;

    //EventTimeTrigger,(window.maxTimestamp() <= ctx.getCurrentWatermark(),会立即fire
    //否则只是ctx.registerEventTimeTimer(window.maxTimestamp()),注册等待后续watermark来触发
    TriggerResult triggerResult = triggerContext.onElement(element); 

    if (triggerResult.isFire()) { //如果Fire
        ACC contents = windowState.get();
        if (contents == null) {
            continue;
        }
        emitWindowContents(window, contents); //emit window内容, 这里会调用自己定义的user function
    }

    //对于比较常用的TumblingEventTimeWindows,用EventTimeTrigger,所以是不会触发purge的
    if (triggerResult.isPurge()) { //如果purge
        windowState.clear();  //将window的state清除掉
    }
    registerCleanupTimer(window); //window的数据也需要清除
}


// side output input event if
// element not handled by any window
// late arriving tag has been set
// windowAssigner is event time and current timestamp + allowed lateness no less than element timestamp
//如果所有的assign window都是late,再判断一下element也是late
if (isSkippedElement && isElementLate(element)) { //isElementLate, (element.getTimestamp() + allowedLateness <= internalTimerService.currentWatermark())
    if (lateDataOutputTag != null){
        sideOutput(element); //如果定义了sideOutput,就输出late element
    } else {
        this.numLateRecordsDropped.inc(); //否则直接丢弃
    }
}

这里currentWatermark的默认值,
private long currentWatermark = Long.MIN_VALUE;

如果定期发送watermark,那么在第一次收到watermark前,不会有late数据

继续看看,数据清除掉逻辑
protected void registerCleanupTimer(W window) {
    long cleanupTime = cleanupTime(window); //cleanupTime, window.maxTimestamp() + allowedLateness

    if (windowAssigner.isEventTime()) {
        triggerContext.registerEventTimeTimer(cleanupTime); //这里只是简单的注册registerEventTimeTimer
    } else {
        triggerContext.registerProcessingTimeTimer(cleanupTime);
    }
}

如果clear只是简单的注册EventTimeTimer,那么在onEventTime的时候一定有clear的逻辑、

 

WindowOperator.onEventTime

if (windowAssigner.isEventTime() && isCleanupTime(triggerContext.window, timer.getTimestamp())) {  //time == cleanupTime(window);
    clearAllState(triggerContext.window, windowState, mergingWindows);
}

果然,onEventTime的时候会判断,如果Timer的time等于 window的cleanup time,就把all state清除掉

所以当超过,window.maxTimestamp() + allowedLateness就会被清理掉

以上是关于Flink - allowedLateness的主要内容,如果未能解决你的问题,请参考以下文章

Flink 窗口延迟数据处理 AllowedLateness

Flink之Watermark滑动窗口案例

Flink / Scala - TimeWindow 处理迟到数据详解

flink 控制窗口行为(触发器移除器允许延迟将迟到的数据放入侧输出流)

01-flink-1.10.1开发flink代码需要的maven

06-flink-1.10.1-flink source api