kafka connect 使用说明

Posted 2021-01-25 boanxin

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了kafka connect 使用说明相关的知识，希望对你有一定的参考价值。

KAFKA CONNECT 使用说明

一、概述

kafka connect 是一个可扩展的、可靠的在kafka和其他系统之间流传输的数据工具。简而言之就是他可以通过Connector（连接器）简单、快速的将大集合数据导入和导出kafka。可以接收整个数据库或收集来自所有的应用程序的消息到kafka的topic中，kafka connect 功能包括：

1，kafka连接器通用框架：kafka connect 规范了kafka和其他数据系统集成，简化了开发、部署和管理。

2，分布式和单机式：扩展到大型支持整个organization的集中管理服务，也可以缩小到开发，测试和小规模生产部署。

3，REST接口：通过rest API 来提交（和管理）Connector到kafka connect 集群。

4，offset自动化管理：从Connector 获取少量信息，connect来管理offset提交。

5，分布式和默认扩展：kafka connect建立在现有的组管理协议上，更多的工作可以添加扩展到connect集群。

6，流/批量集成：利用kafka现有能力，connect是一个桥接流和批量数据系统的理想解决方案。

在这里我们测试connect的kafka版本是：0.9.0.0

二，单机模式

单机模式的命令格式如下：

bin/connect-standalone.sh config/connect-standalone.properties Connector1.properties [Connector2.properties ...]

现在就上述文件我的配

1，connect-standalone.sh 是执行单机模式的命令。

#!/bin/sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

base_dir=$(dirname $0)

if [ "x$KAFKA_LOG4J_OPTS" = "x" ]; then
    export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:$base_dir/../config/connect-log4j.properties"
fi
if [ -z "$KAFKA_HEAP_OPTS" ]; then
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9901 "
fi
if [ -z "$KAFKA_HEAP_OPTS" ]; then
 export KAFKA_HEAP_OPTS="-Xmx1024M"
fi
exec $(dirname $0)/kafka-run-class.sh org.apache.kafka.connect.cli.ConnectStandalone "[email protected]"

在这里可以设置给connect的虚拟机内存设置：

if [ -z "$KAFKA_HEAP_OPTS" ]; then
 export KAFKA_HEAP_OPTS="-Xmx1024M"
fi
 
也可以设置JMS配置：

if [ -z "$KAFKA_HEAP_OPTS" ]; then
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9901 "
fi

2，connect-standalone.properties的配置：

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# These are defaults. This file just demonstrates how to override some settings.
bootstrap.servers=10.253.129.237:9092,10.253.129.238:9092,10.253.129.239:9092

# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter‘s setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true

# The internal converter used for offsets and config data is configurable and must be specified, but most users will
# always want to use the built-in default. Offset and config data is never visible outside of Copcyat in this format.
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false

offset.storage.file.filename=/datafs/20181106/connect.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

这里需要注意broker的配置，其余配置在kafka官网都有说明参考：

http://kafka.apache.org/090/documentation.html#connectconfigs

3，connect-file-source.properties的配置

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name=test_source2
connector.class=FileStreamSource
tasks.max=2
file=/datafs/20181106/json2/log.out
topic=TEST_MANAGER5

注意路径和topic的配置

4，connect-file-sink.properties 的配置：

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name=test_sink1
connector.class=FileStreamSink
#connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector
tasks.max=1
file=/datafs/20181106/a.out
topics=TEST_MANAGER5

三，集群模式

命令格式：

bin/connect-distributed.sh config/connect-distributed.properties

在不同的类中，配置参数定义了 Kafka Connect 如何处理，哪里存储配置，如何分配 work，哪里存储 offset 和任务状态。在分布式模式中，Kafka Connect 在 topic 中存储 offset，配置和任务状态。建议手动创建 offset 的 topic，可以自己来定义需要的分区数和副本数。如果启动 Kafka Connect 时还没有创建 topic，那么 topic 将自动创建（使用默认的分区和副本），这可能不是最合适的（因为 Kafka 可不知道业务需要，只能根据默认参数创建）。特别是以下配置参数尤为关键，启动集群之前设置：

group.id (默认 connect-cluster)：Connect cluster group 使用唯一的名称；注意这不能和 consumer group ID（消费者组）冲突。

config.storage.topic (默认 connect-configs)：topic 用于存储 Connector 和任务配置；注意，这应该是一个单个的 partition，多副本的 topic。你需要手动创建这个 topic，以确保是单个 partition（自动创建的可能会有多个partition）。

offset.storage.topic (默认 connect-offsets) ：topic 用于存储 offsets；这个topic应该配置多个 partition 和副本。

status.storage.topic (默认 connect-status)：topic 用于存储状态；这个 topic 可以有多个 partitions 和副本

注意，在分布式模式中，Connector（连接器）配置不能使用命令行。要使用下面介绍的 REST API 来创建，修改和销毁 Connector。

{"name":"test","config":{"topic":"TEST_MANAGER","connector.class":"FileStreamSource","tasks.max":"2","file":"/datafs/log1.out"}}

四，REST API

由于 Kafka Connect 的目的是作为一个服务运行，提供了一个用于管理 Connector 的 REST API。默认情况下，此服务的端口是8083。以下是当前支持的终端入口：

GET /Connectors：返回活跃的 Connector 列表

POST /Connectors：创建一个新的 Connector；请求的主体是一个包含字符串name字段和对象 config 字段（Connector 的配置参数）的 JSON 对象。

GET /Connectors/{name}：获取指定 Connector 的信息

GET /Connectors/{name}/config：获取指定 Connector 的配置参数

PUT /Connectors/{name}/config：更新指定 Connector 的配置参数

GET /Connectors/{name}/status：获取 Connector 的当前状态，包括它是否正在运行，失败，暂停等。

GET /Connectors/{name}/tasks：获取当前正在运行的 Connector 的任务列表。

GET /Connectors/{name}/tasks/{taskid}/status：获取任务的当前状态，包括是否是运行中的，失败的，暂停的等，

PUT /Connectors/{name}/pause：暂停连接器和它的任务，停止消息处理，直到 Connector 恢复。

PUT /Connectors/{name}/resume：恢复暂停的 Connector（如果 Connector 没有暂停，则什么都不做）

POST /Connectors/{name}/restart：重启 Connector（Connector 已故障）

POST /Connectors/{name}/tasks/{taskId}/restart：重启单个任务 (通常这个任务已失败)

DELETE /Connectors/{name}：删除 Connector, 停止所有的任务并删除其配置

Kafka Connector 还提供了获取有关 Connector plugins 信息的 REST API：

GET /Connector-plugins：返回已在 Kafka Connect 集群安装的 Connector plugin 列表。请注意，API 仅验证处理请求的 worker 的 Connector。这以为着你可能看不不一致的结果，特别是在滚动升级的时候（添加新的 Connector jar）

PUT /Connector-plugins/{Connector-type}/config/validate ：对提供的配置值进行验证，执行对每个配置验证，返回验证的建议值和错误信息

以上是关于kafka connect 使用说明的主要内容，如果未能解决你的问题，请参考以下文章