无法使用 Apache Beam(Python SDK)读取 Pub/Sub 消息
Posted
技术标签:
【中文标题】无法使用 Apache Beam(Python SDK)读取 Pub/Sub 消息【英文标题】:Unable to read Pub/Sub messages with Apache Beam (Python SDK) 【发布时间】:2021-04-26 01:13:36 【问题描述】:我正在尝试使用 Beam 编程框架 (Python SDK) 从 Pub/Sub 主题流式传输消息并将它们写入控制台。
这是我的代码(apache-beam==2.27.0
):
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
TOPIC_PATH = "projects/<project-id>/topics/<topic-id>"
def run(pubsub_topic):
options = PipelineOptions(
streaming=True
)
runner = 'DirectRunner'
print("I reached before pipeline")
with beam.Pipeline(runner, options=options) as pipeline:
(
pipeline
| "Read from Pub/Sub topic" >> beam.io.ReadFromPubSub(topic=pubsub_topic)
| "Writing to console" >> beam.Map(print)
)
print("I reached after pipeline")
result = pipeline.run()
result.wait_until_finish()
run(TOPIC_PATH)
然而,当我执行这个管道时,我得到了这个 TypeError:
ERROR:apache_beam.runners.direct.executor:Exception at bundle <apache_beam.runners.direct.bundle_factory._Bundle object at 0x1349763c0>, due to an exception.
TypeError: create_subscription() takes from 1 to 2 positional arguments but 3 were given
最后说:
ERROR:apache_beam.runners.direct.executor:Giving up after 4 attempts.
我不确定,我做错了什么,提前感谢您的帮助。
【问题讨论】:
pubsub_topic 的值是多少? 主题路径:TOPIC_PATH = "projects/apache-beam==2.27.0
,将此添加到问题@guillaumeblaquiere
【参考方案1】:
我不知道错误的确切位置,但您可以考虑使用以下 Beam 示例之一作为模型并从那里开始吗?
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py
https://github.com/apache/beam/blob/release-2.27.0/sdks/python/apache_beam/examples/snippets/snippets.py#L684
【讨论】:
【参考方案2】:当我通过“pip install apache-beam”安装时遇到了同样的问题。当我切换到“pip install apache-beam [gcp]”时,它对我有用,即使使用 DirectRunner。
【讨论】:
以上是关于无法使用 Apache Beam(Python SDK)读取 Pub/Sub 消息的主要内容,如果未能解决你的问题,请参考以下文章
使用 Python 处理 Apache Beam 管道中的异常
使用Apache-beam在Python中删除字典中的第一项[重复]
Python 上的 Apache Beam 将 beam.Map 调用相乘
Apache Beam - org.apache.beam.sdk.util.UserCodeException:java.sql.SQLException:无法创建 PoolableConnecti