使用 Bigquery 连接器时出错
Posted
技术标签:
【中文标题】使用 Bigquery 连接器时出错【英文标题】:Error while using Bigquery connector 【发布时间】:2018-05-04 16:06:46 【问题描述】:在 Qubole 数据平台上运行 Spotify Spark Bigquery 连接器时出现此错误。我确实在我的 jar 中看到了 BigQueryUtils
类,但它仍然会引发此错误:
线程“main”中的异常 org.spark-project.guava.util.concurrent.ExecutionError: java.lang.NoSuchMethodError: com.google.cloud.hadoop.io.bigquery.BigQueryUtils.waitForJobCompletion
下面附上pom...
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.xyz.abc.google.TestProject</groupId>
<artifactId>edesem-google-TestProject</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<gpg.skip>true</gpg.skip>
<!-- Keep in sync with google-api-client dependency -->
<apache.httpcomponents.version>4.0.1</apache.httpcomponents.version>
</properties>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.3.1</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
<!-- Maven Shade Plugin -->
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>edesem-google-TestProject</finalName>
<shadedArtifactAttached>false</shadedArtifactAttached>
<artifactSet>
<includes>
<include>*:*</include>
</includes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>reference.conf</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
<resource>log4j.properties</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.xyz.abc.bigquery.TestProjectBQClient</mainClass>
</transformer>
</transformers>
<relocations>
<relocation>
<pattern>org.eclipse.jetty</pattern>
<shadedPattern>org.spark-project.jetty</shadedPattern>
<includes>
<include>org.eclipse.jetty.**</include>
</includes>
</relocation>
<relocation>
<pattern>com.google.common</pattern>
<shadedPattern>org.spark-project.guava</shadedPattern>
<excludes>
<exclude>com/google/common/base/Absent*</exclude>
<exclude>com/google/common/base/Function</exclude>
<exclude>com/google/common/base/Optional*</exclude>
<exclude>com/google/common/base/Present*</exclude>
<exclude>com/google/common/base/Supplier</exclude>
</excludes>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.6</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.2.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>2.2.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-avro_2.10</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>bigquery-connector</artifactId>
<version>0.10.2-hadoop2</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.21</version>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.9.3</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.10</artifactId>
<version>2.2.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>1.8.0-hadoop2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/util-hadoop -->
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>util-hadoop</artifactId>
<version>1.8.0-hadoop2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcsio -->
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcsio</artifactId>
<version>1.8.0</version>
</dependency>
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>util</artifactId>
<version>1.8.0</version>
<exclusions>
<exclusion>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client-java6</artifactId>
</exclusion>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.8.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>23.6-jre</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-bigquery -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquery</artifactId>
<version>1.23.0</version>
</dependency>
</dependencies>
</project>
【问题讨论】:
【参考方案1】:我认为对我来说主要问题是集群中的大型查询连接器配置。我将 jar 添加到类路径中,它解决了这个问题。按照 Google 文档的说明如下。
https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md#add-the-connector-jar-to-hadoops-classpath
将连接器 jar 添加到 Hadoop 的类路径
将连接器 jar 放在 Hadoop 安装的适当子目录中可能会有效地让 Hadoop 加载该 jar。但是,为了确保 jar 已加载,请在 Hadoop 配置目录中将 HADOOP_CLASSPATH=$HADOOP_CLASSPATH:</path/to/gcs-connector-jar>
添加到 hadoop-env.sh
。
【讨论】:
【参考方案2】:这是因为您使用的 com.google.cloud.bigdataoss:bigquery-connector:0.10.2-hadoop2
BigQuery 连接器版本与 com.google.cloud:google-cloud-bigquery:1.23.0
库版本不兼容。
您需要将com.google.cloud.bigdataoss:bigquery-connector
至少升级到0.11.0
版本,并使其与其他com.google.cloud.bigdataoss
依赖项的版本一致(在您的情况下它将是0.12.0
版本),即它们都应该来自此处列出的相同版本:https://github.com/GoogleCloudPlatform/bigdata-interop/releases
【讨论】:
以上是关于使用 Bigquery 连接器时出错的主要内容,如果未能解决你的问题,请参考以下文章
将 Google Data Studio 社区连接器与 BigQuery 结合使用时的时间戳查询问题
适用于 excel 的 BigQuery 连接器 - 请求失败:错误。无法执行查询。获取 URL 时超时