数据流:从 Pubsub RuntimeException 导出到 Bigquery
Posted
技术标签:
【中文标题】数据流:从 Pubsub RuntimeException 导出到 Bigquery【英文标题】:Dataflow: Export to Bigquery from Pubsub RuntimeException 【发布时间】:2018-09-05 08:52:01 【问题描述】:我正在使用 pubsub 中的“导出到 bigquery”功能通过数据流将常规 JSON 从 pubsub 传递到 bigquery。
但是它工作了一秒钟,这意味着一些条目正确地通过了 bigquery。但现在我在数据流日志上遇到错误
java.lang.RuntimeException:java.io.IOException:插入失败: ["errors":["debugInfo":"","location":"_cmets","message":"没有这样的 字段。","原因":"无效"],"索引":0] org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:131) org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:97) 引起:java.io.IOException:插入失败: ["errors":["debugInfo":"","location":"_cmets","message":"没有这样的 字段。","原因":"无效"],"索引":0]
...很多行...
org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:811) org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:127) org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:97) org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn$DoFnInvoker.invokeFinishBundle(未知 资源) org.apache.beam.runners.core.SimpleDoFnRunner.finishBundle(SimpleDoFnRunner.java:187) com.google.cloud.dataflow.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:407) com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:60) com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:76) com.google.cloud.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1069) com.google.cloud.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:133) com.google.cloud.dataflow.worker.StreamingDataflowWorker$8.run(StreamingDataflowWorker.java:841) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745)
【问题讨论】:
【参考方案1】:Pub/Sub 中的字段与 Big Query 中的字段似乎不匹配。
检查双方的字段名称是否相同。您可以在here 中查看有关 Dataflow 模板的更多信息
【讨论】:
以上是关于数据流:从 Pubsub RuntimeException 导出到 Bigquery的主要内容,如果未能解决你的问题,请参考以下文章
如何从 PubSub 主题读取数据并将其解析到光束管道中并打印
从 google pubsub 到 spark 流的数据摄取速度很慢