Kafka-Connect:启动 S3 Sink 连接器时出现无法识别的错误

Posted

技术标签:

【中文标题】Kafka-Connect:启动 S3 Sink 连接器时出现无法识别的错误【英文标题】:Kafka-Connect : Getting an unrecognised error when starting a S3 Sink Connector 【发布时间】:2021-02-03 16:12:06 【问题描述】:

我正在尝试为具有 3 个节点的 Kafka Connect 集群设置我的第三个工作人员。工作人员在这第三个节点上正常运行,我能够进行 REST 调用以获取现有连接器(现在我有 2 个,每个节点上一个)。但是,当我尝试使用以下命令进行 POST 调用以创建第三个连接器时:

curl -X POST -H "Content-Type: application/json" --data @test-s3-sink-config.json http://<my-host>:<my-port>/connectors

我得到这个 TimeoutException 响应:

"error_code":500,"message":"IO Error trying to forward REST request: java.net.SocketTimeoutException: Connect Timeout"

当我查看工作堆栈跟踪时,它会显示以下内容:

[2020-10-20 18:27:04,062] INFO AbstractConfig values:
 (org.apache.kafka.common.config.AbstractConfig:354)
[2020-10-20 18:27:19,081] ERROR IO error forwarding REST request:  (org.apache.kafka.connect.runtime.rest.RestClient:143)
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Connect Timeout
        at org.eclipse.jetty.client.util.FutureResponseListener.getResult(FutureResponseListener.java:118)
        at org.eclipse.jetty.client.util.FutureResponseListener.get(FutureResponseListener.java:101)
        at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:711)
        at org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:125)
        at org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:65)
        at org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.completeOrForwardRequest(ConnectorsResource.java:369)
        at org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource.createConnector(ConnectorsResource.java:164)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.jav
a:52)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
        at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherPr
ovider.java:176)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
        at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:173)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:500)
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
        at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Connect Timeout
        at org.eclipse.jetty.io.ManagedSelector$Connect.run(ManagedSelector.java:812)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more

跟踪的第一个日志是困扰我的,因为我没有看到任何有关我做错了什么的相关信息,第二个日志只是 TimeoutException。我到处寻找有类似问题的人,或者了解“AbstractConfig”类但找不到任何有用的东西,来自 Kafka 的this is the AbstractConfig class(我使用的是 Kafka 版本 2.0.0)。

最后,这是我正在使用的配置文件:

"name":"s3-connector-orderbooks",
"config":
"connector.class":"io.confluent.connect.s3.S3SinkConnector",
"file":"snapshots-test",
"format.class":"io.confluent.connect.s3.format.json.JsonFormat",
"flush.size":"1000000",
"tasks.max":"1",
"topics":"binance-full-snaps-test",
"timezone":"UTC",
"storage.class":"io.confluent.connect.s3.storage.S3Storage",
"rotate.schedule.interval.ms":"3600000",
"s3.bucket.name":"pfc-data",
"timestamp.extractor":"Record",
"partitioner.class":"io.confluent.connect.storage.partitioner.HourlyPartitioner",
"locale":"en-US",
"s3.compression.type":"gzip"


如果您觉得我应该包含任何其他信息,请随时提出要求,我对堆栈溢出比较陌生。

我很想知道是否有人遇到过这样的事情,或者是否有人知道是什么导致了这里的问题。谢谢!

【问题讨论】:

【参考方案1】:

在 Kafka 连接集群中,领导节点是负责服务 REST 请求的节点。因此,值得检查领导节点是否可以到达集群中所有可用的工作节点。您可以在distributed.properties 中检查您的rest.advertised.host.name,以确保在连接集群中可以访问节点的通告主机名。 Robin Moffatt 在他的blog 中有一篇关于这个主题的好文章。请阅读它以获得良好的洞察力。

【讨论】:

以上是关于Kafka-Connect:启动 S3 Sink 连接器时出现无法识别的错误的主要内容,如果未能解决你的问题,请参考以下文章

Kafka-Connect Cassandra Sink 连接器不将数据推送到 Cassandra

kafka-connect JDBC PostgreSQL Sink Connector 显式定义 PostgrSQL 模式(命名空间)

kafka-connect sink 连接器 pk.mode 用于具有自动增量的表

Kafka连接s3 sink多个分区

kafka s3 sink连接器在获取NULL数据时崩溃

如何在 Kafka Sink 中为不同环境定义 s3bucket 的名称