使用 Bigquery 连接器时出错

Posted

技术标签:

【中文标题】使用 Bigquery 连接器时出错【英文标题】:Error while using Bigquery connector 【发布时间】:2018-05-04 16:06:46 【问题描述】:

在 Qubole 数据平台上运行 Spotify Spark Bigquery 连接器时出现此错误。我确实在我的 jar 中看到了 BigQueryUtils 类,但它仍然会引发此错误:

线程“main”中的异常 org.spark-project.guava.util.concurrent.ExecutionError: java.lang.NoSuchMethodError: com.google.cloud.hadoop.io.bigquery.BigQueryUtils.waitForJobCompletion

下面附上pom...

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.xyz.abc.google.TestProject</groupId>
  <artifactId>edesem-google-TestProject</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <properties>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

    <gpg.skip>true</gpg.skip>

    <!-- Keep in sync with google-api-client dependency -->
    <apache.httpcomponents.version>4.0.1</apache.httpcomponents.version>
  </properties>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <plugins>
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.3.1</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.5.1</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>

      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
      </plugin>

      <!-- Maven Shade Plugin -->
      <plugin>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <finalName>edesem-google-TestProject</finalName>
              <shadedArtifactAttached>false</shadedArtifactAttached>
              <artifactSet>
                <includes>
                  <include>*:*</include>
                </includes>
              </artifactSet>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer" />
                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                  <resource>reference.conf</resource>
                </transformer>
                <transformer implementation="org.apache.maven.plugins.shade.resource.DontIncludeResourceTransformer">
                  <resource>log4j.properties</resource>
                </transformer>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass>com.xyz.abc.bigquery.TestProjectBQClient</mainClass>
                </transformer>
              </transformers>
              <relocations>
                <relocation>
                  <pattern>org.eclipse.jetty</pattern>
                  <shadedPattern>org.spark-project.jetty</shadedPattern>
                  <includes>
                    <include>org.eclipse.jetty.**</include>
                  </includes>
                </relocation>
                <relocation>
                  <pattern>com.google.common</pattern>
                  <shadedPattern>org.spark-project.guava</shadedPattern>
                  <excludes>
                    <exclude>com/google/common/base/Absent*</exclude>
                    <exclude>com/google/common/base/Function</exclude>
                    <exclude>com/google/common/base/Optional*</exclude>
                    <exclude>com/google/common/base/Present*</exclude>
                    <exclude>com/google/common/base/Supplier</exclude>
                  </excludes>
                </relocation>
              </relocations>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.10.6</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>2.2.0</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.10</artifactId>
      <version>2.2.0</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.databricks</groupId>
      <artifactId>spark-avro_2.10</artifactId>
      <version>4.0.0</version>
    </dependency>
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>bigquery-connector</artifactId>
      <version>0.10.2-hadoop2</version>
      <exclusions>
        <exclusion>
          <groupId>com.google.guava</groupId>
          <artifactId>guava-jdk5</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-simple</artifactId>
      <version>1.7.21</version>
    </dependency>
    <dependency>
      <groupId>joda-time</groupId>
      <artifactId>joda-time</artifactId>
      <version>2.9.3</version>
    </dependency>
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_2.10</artifactId>
      <version>2.2.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>gcs-connector</artifactId>
      <version>1.8.0-hadoop2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/util-hadoop -->
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>util-hadoop</artifactId>
      <version>1.8.0-hadoop2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcsio -->
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>gcsio</artifactId>
      <version>1.8.0</version>
    </dependency>
    <dependency>
      <groupId>com.google.cloud.bigdataoss</groupId>
      <artifactId>util</artifactId>
      <version>1.8.0</version>
      <exclusions>
        <exclusion>
          <groupId>com.google.api-client</groupId>
          <artifactId>google-api-client-java6</artifactId>
        </exclusion>
        <exclusion>
          <groupId>com.google.guava</groupId>
          <artifactId>guava-jdk5</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-mapreduce-client-core</artifactId>
      <version>2.8.3</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>2.8.3</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>23.6-jre</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-bigquery -->
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>google-cloud-bigquery</artifactId>
      <version>1.23.0</version>
    </dependency>
  </dependencies>
</project>

【问题讨论】:

【参考方案1】:

我认为对我来说主要问题是集群中的大型查询连接器配置。我将 jar 添加到类路径中,它解决了这个问题。按照 Google 文档的说明如下。

https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md#add-the-connector-jar-to-hadoops-classpath

将连接器 jar 添加到 Hadoop 的类路径 将连接器 jar 放在 Hadoop 安装的适当子目录中可能会有效地让 Hadoop 加载该 jar。但是,为了确保 jar 已加载,请在 Hadoop 配置目录中将 HADOOP_CLASSPATH=$HADOOP_CLASSPATH:&lt;/path/to/gcs-connector-jar&gt; 添加到 hadoop-env.sh

【讨论】:

【参考方案2】:

这是因为您使用的 com.google.cloud.bigdataoss:bigquery-connector:0.10.2-hadoop2 BigQuery 连接器版本与 com.google.cloud:google-cloud-bigquery:1.23.0 库版本不兼容。

您需要将com.google.cloud.bigdataoss:bigquery-connector 至少升级到0.11.0 版本,并使其与其他com.google.cloud.bigdataoss 依赖项的版本一致(在您的情况下它将是0.12.0 版本),即它们都应该来自此处列出的相同版本:https://github.com/GoogleCloudPlatform/bigdata-interop/releases

【讨论】:

以上是关于使用 Bigquery 连接器时出错的主要内容,如果未能解决你的问题,请参考以下文章

将 Google Data Studio 社区连接器与 BigQuery 结合使用时的时间戳查询问题

适用于 excel 的 BigQuery 连接器 - 请求失败:错误。无法执行查询。获取 URL 时超时

我在BigQuery中加入时出错

使用 bigquery 对数据存储键进行连接

Datastudio BigQuery 连接器:查询返回错误

Spring Boot BigQuery 数据源连接