Flink - allowedLateness
Posted fxjwind
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Flink - allowedLateness相关的知识,希望对你有一定的参考价值。
WindowOperator
processElement
final Collection<W> elementWindows = windowAssigner.assignWindows( //找出该element被assign的所有windows element.getValue(), element.getTimestamp(), windowAssignerContext); //if element is handled by none of assigned elementWindows boolean isSkippedElement = true; //element默认是会skiped for (W window: elementWindows) { // drop if the window is already late if (isWindowLate(window)) { //如果window是late,逻辑是window.maxTimestamp() + allowedLateness <= internalTimerService.currentWatermark(),continue表示skip continue; } isSkippedElement = false; //只要有一个窗口非late,该element就是非late数据 windowState.setCurrentNamespace(window); windowState.add(element.getValue()); //把数据加到windowState中 triggerContext.key = key; triggerContext.window = window; //EventTimeTrigger,(window.maxTimestamp() <= ctx.getCurrentWatermark(),会立即fire //否则只是ctx.registerEventTimeTimer(window.maxTimestamp()),注册等待后续watermark来触发 TriggerResult triggerResult = triggerContext.onElement(element); if (triggerResult.isFire()) { //如果Fire ACC contents = windowState.get(); if (contents == null) { continue; } emitWindowContents(window, contents); //emit window内容, 这里会调用自己定义的user function } //对于比较常用的TumblingEventTimeWindows,用EventTimeTrigger,所以是不会触发purge的 if (triggerResult.isPurge()) { //如果purge windowState.clear(); //将window的state清除掉 } registerCleanupTimer(window); //window的数据也需要清除 } // side output input event if // element not handled by any window // late arriving tag has been set // windowAssigner is event time and current timestamp + allowed lateness no less than element timestamp //如果所有的assign window都是late,再判断一下element也是late if (isSkippedElement && isElementLate(element)) { //isElementLate, (element.getTimestamp() + allowedLateness <= internalTimerService.currentWatermark()) if (lateDataOutputTag != null){ sideOutput(element); //如果定义了sideOutput,就输出late element } else { this.numLateRecordsDropped.inc(); //否则直接丢弃 } }
这里currentWatermark的默认值,
private long currentWatermark = Long.MIN_VALUE;
如果定期发送watermark,那么在第一次收到watermark前,不会有late数据
继续看看,数据清除掉逻辑
protected void registerCleanupTimer(W window) { long cleanupTime = cleanupTime(window); //cleanupTime, window.maxTimestamp() + allowedLateness if (windowAssigner.isEventTime()) { triggerContext.registerEventTimeTimer(cleanupTime); //这里只是简单的注册registerEventTimeTimer } else { triggerContext.registerProcessingTimeTimer(cleanupTime); } }
如果clear只是简单的注册EventTimeTimer,那么在onEventTime的时候一定有clear的逻辑、
WindowOperator.onEventTime
if (windowAssigner.isEventTime() && isCleanupTime(triggerContext.window, timer.getTimestamp())) { //time == cleanupTime(window); clearAllState(triggerContext.window, windowState, mergingWindows); }
果然,onEventTime的时候会判断,如果Timer的time等于 window的cleanup time,就把all state清除掉
所以当超过,window.maxTimestamp() + allowedLateness就会被清理掉
以上是关于Flink - allowedLateness的主要内容,如果未能解决你的问题,请参考以下文章
Flink 窗口延迟数据处理 AllowedLateness
Flink / Scala - TimeWindow 处理迟到数据详解
flink 控制窗口行为(触发器移除器允许延迟将迟到的数据放入侧输出流)