使用独立模式 Kafka-connect 将 Postgresql 的数据捕获更改为 kafka 主题

Posted

技术标签:

【中文标题】使用独立模式 Kafka-connect 将 Postgresql 的数据捕获更改为 kafka 主题【英文标题】:Change-data-capture from Postgres SQL to kafka topics using standalone mode Kafka-connect 【发布时间】:2021-07-03 05:29:07 【问题描述】:

我一直在尝试使用以下命令 /bin connect-standalone.properties config/connect-standalone.properties postgres.sproperties 从 postgres sql 获取数据到 kafka 主题,但我面临着几个问题 这是我的 postgres.properties 文件的内容:

name=cdc_demo
connector.class=io.debezium.connector.postgresql.PostgresConnector
tasks.max=1
plugin.name=decoderbufs
slot.name=debezium
slot.drop_on_stop=false
database.hostname=localhost
database.port=5432
database.user=postgres
database.password=XXXXX
database.dbname=snehildb
time.precision.mode=adaptive
database.sslmode=disable
database.server.name=localhost:5432/snehildb
table.whitelist=public.students
decimal.handling.mode=precise
topic.creation.enable=true`

这里是 connect-standalone.properties 的内容:

# These are defaults. This file just demonstrates how to override some settings.
bootstrap.servers=localhost:9092

# The converters specify the format of data in Kafka and how to translate it into Connect data. Every 
Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into 
Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the 
converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true

offset.storage.file.filename=/tmp/connect.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for 
plugins
# (connectors, converters, transformations). The list should consist of top level directories that 
include
# any combination of: 
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and 
their dependencies
# Note: symlinks will be followed to discover dependencies or plugins.
# Examples:
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
plugin.path=/home/azureuser/plugins

我收到了几条警告,但我无法解决以下三个主要错误:

 ERROR Postgres server wal_level property must be "logical" but is: replica 
 (io.debezium.connector.postgresql.PostgresConnector:101)
 (org.apache.kafka.common.config.AbstractConfig:361)
 ERROR Failed to create job for config/postgres.properties 
 (org.apache.kafka.connect.cli.ConnectStandalone:110)
 ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:121)

我是 Kafka 新手,如果有人能指出我的错误会非常有帮助。

【问题讨论】:

【参考方案1】:

Debezium 要求 wal_level 为 logical

https://www.postgresql.org/docs/9.6/runtime-config-wal.html

在课堂上看一下 postgres 连接器的内部:

debeizum repo 中的 io.debezium.connector.postgresql.PostgresConnector.java:

https://github.com/debezium/debezium/blob/master/debezium-connector-postgres/src/main/java/io/debezium/connector/postgresql/PostgresConnector.java

【讨论】:

您好 senjin.hajrulahovic,感谢您指出这一点。我在 postgres.conf 中更改了此属性,前两个错误现在消失了,但我仍然收到连接器错误,我会尝试解决。非常感谢您的帮助!。

以上是关于使用独立模式 Kafka-connect 将 Postgresql 的数据捕获更改为 kafka 主题的主要内容,如果未能解决你的问题,请参考以下文章

kafka-connect JDBC PostgreSQL Sink Connector 显式定义 PostgrSQL 模式(命名空间)

没有模式注册表的 Kafka-connect

Kafka-Connect:在分布式模式下创建新连接器就是创建新组

使用本地 kafka-connect 集群连接远程数据库的连接超时

kafka-connect sink 连接器 pk.mode 用于具有自动增量的表

WebUI 自动化测试的经典设计模式:PO