kafka配额控制

Posted sanmutongzi

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了kafka配额控制相关的知识,希望对你有一定的参考价值。

转载请注明地址http://www.cnblogs.com/dongxiao-yang/p/5217754.html 

Starting in 0.9, the Kafka cluster has the ability to enforce quotas on produce and fetch requests. Quotas are basically byte-rate thresholds defined per client-id. A client-id logically identifies an application making a request. Hence a single client-id can span multiple producer and consumer instances and the quota will apply for all of them as a single entity i.e. if client-id="test-client" has a produce quota of 10MB/sec, this is shared across all instances with that same id.

从0.9版本开始,kafka集群新增了针对生产和消费请求进行配额(quotas)控制的能力。Quota基本上是一个单client-id的数据byte速率的门槛值的概念。逻辑上一个client-id代表了一个产生请求的应用程序。一个client-id理论上可以拥有多个producer或者consumer实例,qouta会把这些实例当做一整个个体来对待。假如一个client-id为“test-client”的程序quota设置为10MB/s,拥有相同id的实例会共享这个配额。

 

It is possible for producers and consumers to produce/consume very high volumes of data and thus monopolize broker resources, cause network saturation and generally DOS other clients and the brokers themselves. Having quotas protects against these issues and is all tbe more important in large multi-tenant clusters where a small set of badly behaved clients can degrade user experience for the well behaved ones. In fact, when running Kafka as a service this even makes it possible to enforce API limits according to an agreed upon contract.

生产/消费者有可能会产生非常高的数据吞吐并因此抢占了broker服务器的资源,造成网络饱和并且通常会DoS掉其他client的连接和broker服务器本身。加入quota机制可以预防上述情况的发生,更可以避免集群在多用户的场景下由于个别客户端的异常流量影响其余正常客户端的使用(就是一颗老鼠屎坏了一锅汤的意思)。其实,由于kafka本来就是作为一个后台服务运行的,通过api接口强制进行限制约定是很可行的办法。

 

By default, each unique client-id receives a fixed quota in bytes/sec as configured by the cluster (quota.producer.default, quota.consumer.default). This quota is defined on a per-broker basis. Each client can publish/fetch a maximum of X bytes/sec per broker before it gets throttled. We decided that defining these quotas per broker is much better than having a fixed cluster wide bandwidth per client because that would require a mechanism to share client quota usage among all the brokers. This can be harder to get right than the quota implementation itself!

默认情况下,每一个单独的client-id对应一份集群配置的固定quta速度(默认配置在quota.producer.default, quota.consumer.default)。quota是一个被定义到每台broker粒度的概念。每个client在达到限速前可以与单台broker产生最大为X bytes/sec的写/读流量请求。决定将quota定义到每台broker粒度比设置一个固定的全集群粒度的带宽概念更合适,这样可以省去一个在集群broker间协调quota的机制。这个协调机制可能比quota机制本身的实现更为麻烦!

 

How does a broker react when it detects a quota violation? In our solution, the broker does not return an error rather it attempts to slow down a client exceeding its quota. It computes the amount of delay needed to bring a guilty client under it\'s quota and delays the response for that time. This approach keeps the quota violation transparent to clients (outside of client side metrics). This also keeps them from having to implement any special backoff and retry behavior which can get tricky. In fact, bad client behavior (retry without backoff) can exacerbate the very problem quotas are trying to solve.

broker在发现超出quota的情况下会如何处理?我们目前的处理方法是,broker并不会返回错误信息而是会尝试降低客户端的速度。broker计算出将客户端速度限制在quota以下需要的delay时间然后在response时先delay这么多时间再响应。这种机制基本实现将quota限速功能对客户端透明化(无需客户端一侧的配置),同时也避免了客户端需要实现的复杂麻烦的backoff和retry的逻辑。事实上,异常的客户端行为(没有回退机制的重试)可能将quota想要解决的问题更加恶化。

 

Client byte rate is measured over multiple small windows (for e.g. 30 windows of 1 second each) in order to detect and correct quota violations quickly. Typically, having large measurement windows (for e.g. 10 windows of 30 seconds each) leads to large bursts of traffic followed by long delays which is not great in terms of user experience.

客户端的byte速率是通过多个小的窗口抽样(比如说每秒抽样30次)来准确并迅速的发现quota超限的情况。通常来说,长时间的抽样窗口(比如30秒抽样10次)会由于延迟较大可能会发生流量拥堵的状况,这会影响用户体验。

Quota overrides(quota 重设)

It is possible to override the default quota for client-ids that need a higher (or even lower) quota. The mechanism is similar to the per-topic log config overrides. Client-id overrides are written to ZooKeeper under/config/clients. These overrides are read by all brokers and are effective immediately. This lets us change quotas without having to do a rolling restart of the entire cluster. See here for details.

可以修改某个客户端默认的quota,方法与每个topic的config重设类似。客户端的设置位于zookeeper服务/config/clients路径下。这个改动会被所有broker读取并且立即生效。

这可以使我们无需滚动重启整个集群就可以改变quota。参考如下连接 here

 

以上是关于kafka配额控制的主要内容,如果未能解决你的问题,请参考以下文章

kafka的客户端限流(资源配额)

docker容器资源配额控制

使用cgroups控制进程cpu配额

关于Kafka配额的讨论

关于Kafka配额的讨论

Docker的资源管理控制(CPU内存磁盘IO配额)