storm trident 如何标记一个batch被处理——coordinator spout
Posted brainstorm
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了storm trident 如何标记一个batch被处理——coordinator spout相关的知识,希望对你有一定的参考价值。
Splitting a stream has no effect on the batch. If you join the stream back together, then yes, it will be the same batch.
Tuples are passed between partitions in the order they‘re emitted (repartitioning happens on groupBy, partitioning operations, and global aggregations). These are the same semantics you get from Storm.
State updates are ordered among batches.
Each batch has both a "txid" and an "attempt id". The attempt id is a random long. This ensures that Storm can distinguish between multiple attempts for the same batch.
Batches are controlled by a single coordinator thread (which is a regular Storm spout) that determines when batches get processed and when they get committed (commits are when state updates happen).
The coordinator also ensures the ordering.
The coordinator abstraction is actually quite elegant. It builds upon the primitives that the tuple tree/acking framework provides to implement a relatively sophisticated distributed coordination algorithm.
Also, is there any way to turn off acking in Trident? Not tagging tuples with message IDs and setting ackers to 0 don‘t seem to work (the latter causes a stack overflow).
No, you can‘t. Acking isn‘t really expensive in Trident as long as your batches are of non-trivial size.
https://groups.google.com/forum/#!topic/storm-user/AUajG72kxmo
以上是关于storm trident 如何标记一个batch被处理——coordinator spout的主要内容,如果未能解决你的问题,请参考以下文章
Storm Trident示例ReducerAggregator