数据流:无法在不同位置读写:源:asia-south1,目标:us-central1"

Posted

技术标签:

【中文标题】数据流:无法在不同位置读写:源:asia-south1,目标:us-central1"【英文标题】:Dataflow : Cannot read and write in different locations: source: asia-south1, destination: us-central1" 【发布时间】:2020-11-13 02:29:25 【问题描述】:

运行数据流时出现以下错误。我的数据源在 GCP BQ(asia-south1) 中,目标是 PostgreSQL DB(AWS -> Mumbai Region)。

java.io.IOException: Extract job beam_job_0c64359f7e274ff1ba4072732d7d9653_firstcrybqpgnageshpinjarkar07200750105c51e26c-extract failed, status: 
  "errorResult" : 
    "message" : "Cannot read and write in different locations: source: asia-south1, destination: us-central1",
    "reason" : "invalid"
  ,
  "errors" : [ 
    "message" : "Cannot read and write in different locations: source: asia-south1, destination: us-central1",
    "reason" : "invalid"
   ],
  "state" : "DONE"
.
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.executeExtract(BigQuerySourceBase.java:185)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:121)
    at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:139)
    at com.google.cloud.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:275)
    at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:197)
    at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:181)
    at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:160)
    at com.google.cloud.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:77)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
    at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
    at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

我的代码如下:

      p
      .apply(BigQueryIO.read().from("datalake:Yearly2020.Sales"))
      .apply(JdbcIO.<TableRow>write()
     .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create("org.postgresql.Driver", "jdbc:postgresql://xx.xx.xx.xx:1111/dbname")
        .withUsername("username")
        .withPassword("password"))
      .withStatement("INSERT INTO Table VALUES(ProductRevenue)")
      .withPreparedStatementSetter(new BQPGStatementSetter()));

    p.run().waitUntilFinish();

我正在运行如下管道:

gcloud beta dataflow jobs run sales_data \
--gcs-location gs://datalake-templates/Template   \
--region=asia-east1 \
--network=datalake-vpc \
--subnetwork=regions/asia-east1/subnetworks/asia-east1 \

【问题讨论】:

【参考方案1】:

当 Bigquery 是源时,它会运行将数据暂存到 gcs 存储桶中的加载作业。数据暂存于temp_location,如果未指定temp_location,则使用staging_location 中指定的区域。

在数据流作业中,您能否使用在 asia-south 中创建的存储桶指定 temp_location,因为这是您的 Bigquery 数据集所在的位置。

另外,如果您使用网络和子网,建议关闭公共 ip,以便通过 *** 完成连接。

【讨论】:

我们尝试使用 no-use-public-ips 禁用公共 ip,但它给出了错误“无法创建 PoolableConnectionFactory”。

以上是关于数据流:无法在不同位置读写:源:asia-south1,目标:us-central1"的主要内容,如果未能解决你的问题,请参考以下文章

使用 Fused Location Provider 查找位置源

VB.NET 怎么读写二进制文件,类似Open

Labview 局部变量

read()的Linux C

Android - 文件读写操作 总结

Android - 文件读写操作 总结