Pentaho 数据集成 Google BigQuery Loader 异常

Posted

技术标签:

【中文标题】Pentaho 数据集成 Google BigQuery Loader 异常【英文标题】:Pentaho Data Integration Google BigQuery Loader exception 【发布时间】:2019-10-25 19:15:52 【问题描述】:

我使用 Pentaho 数据集成创建作业,通过“Google BigQuery Loader”步骤将数据从 Google Cloud Storage 加载到 Google BigQuery。 虽然该步骤成功将数据加载到 BigQuery 数据集表中(由 BigQuery 作业日志和表数据检查),但它会引发 NPE:

2019/10/24 10:21:31 - Job 1 - Starting entry [Google BigQuery Loader]
2019/10/24 10:21:31 - Job 1 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : java.lang.NullPointerException
2019/10/24 10:21:31 - Job 1 -   at com.pentaho.di.job.entries.google.bigquery.JobEntryBigQueryLoader.execute(JobEntryBigQueryLoader.java:383)
2019/10/24 10:21:31 - Job 1 -   at org.pentaho.di.job.Job.execute(Job.java:680)
2019/10/24 10:21:31 - Job 1 -   at org.pentaho.di.job.Job.execute(Job.java:821)
2019/10/24 10:21:31 - Job 1 -   at org.pentaho.di.job.Job.execute(Job.java:497)
2019/10/24 10:21:31 - Job 1 -   at org.pentaho.di.job.Job.run(Job.java:384)
2019/10/24 10:21:31 - Job 1 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : A serious error occurred during job execution: 
2019/10/24 10:21:31 - Job 1 - Unexpected error occurred while launching entry [Google BigQuery Loader.0]
2019/10/24 10:21:31 - Job 1 -  at org.pentaho.di.job.Job.run (Job.java:384)
2019/10/24 10:21:31 - Job 1 -  at org.pentaho.di.job.Job.execute (Job.java:497)
2019/10/24 10:21:31 - Job 1 -  at org.pentaho.di.job.Job.execute (Job.java:821)
2019/10/24 10:21:31 - Job 1 -  at org.pentaho.di.job.Job.execute (Job.java:680)
2019/10/24 10:21:31 - Job 1 -  at com.pentaho.di.job.entries.google.bigquery.JobEntryBigQueryLoader.execute (JobEntryBigQueryLoader.java:383)
2019/10/24 10:21:31 - Job 1 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : org.pentaho.di.core.exception.KettleException: 
2019/10/24 10:21:31 - Job 1 - Unexpected error occurred while launching entry [Google BigQuery Loader.0]
2019/10/24 10:21:31 - Job 1 -  at org.pentaho.di.job.Job.run (Job.java:384)
2019/10/24 10:21:31 - Job 1 -  at org.pentaho.di.job.Job.execute (Job.java:497)
2019/10/24 10:21:31 - Job 1 -  at org.pentaho.di.job.Job.execute (Job.java:821)
2019/10/24 10:21:31 - Job 1 -  at org.pentaho.di.job.Job.execute (Job.java:680)
2019/10/24 10:21:31 - Job 1 -  at com.pentaho.di.job.entries.google.bigquery.JobEntryBigQueryLoader.execute (JobEntryBigQueryLoader.java:383)
2019/10/24 10:21:31 - Job 1 - 
2019/10/24 10:21:31 - Job 1 -   at org.pentaho.di.job.Job.execute(Job.java:824)
2019/10/24 10:21:31 - Job 1 -   at org.pentaho.di.job.Job.execute(Job.java:497)
2019/10/24 10:21:31 - Job 1 -   at org.pentaho.di.job.Job.run(Job.java:384)
2019/10/24 10:21:31 - Job 1 - Caused by: java.lang.NullPointerException
2019/10/24 10:21:31 - Job 1 -   at com.pentaho.di.job.entries.google.bigquery.JobEntryBigQueryLoader.execute(JobEntryBigQueryLoader.java:383)
2019/10/24 10:21:31 - Job 1 -   at org.pentaho.di.job.Job.execute(Job.java:680)
2019/10/24 10:21:31 - Job 1 -   at org.pentaho.di.job.Job.execute(Job.java:821)
2019/10/24 10:21:31 - Job 1 -   ... 2 more
2019/10/24 10:21:31 - Spoon - Job has ended.

PDI 8.1 和 8.2 显示相同的结果。 PDI 8.3 没有 Google BigQuery Loader 步骤。

感谢任何帮助或解决方法。

【问题讨论】:

【参考方案1】:

我得到了 PDI Shell 脚本步骤和 gcloud bq cli 的解决方法。 Issue 在 Pentaho 错误跟踪器中。

【讨论】:

名称“Google BigQuery Loader”可能会造成混淆。这个扩展/插件属于 Pentaho,而不是 BigQuery 由于您使用的是 Cloud Storage 和 BigQuery(两者都是 GCP 中的服务),您可以load data directly into BigQuery from Cloud Storage 反之亦然。

以上是关于Pentaho 数据集成 Google BigQuery Loader 异常的主要内容,如果未能解决你的问题,请参考以下文章

Pentaho 组件集成

Pentaho 数据集成:错误处理

Pentaho 数据集成用户定义的 Java 类

使用 pentaho 数据集成的 SQL 标识列插入

pentaho 数据集成中新插入或更新的行数

Pentaho 数据集成输入/输出位类型错误