IllegalArgumentException:此服务需要项目 ID,但无法从构建器或环境中确定

Posted

技术标签:

【中文标题】IllegalArgumentException:此服务需要项目 ID,但无法从构建器或环境中确定【英文标题】:IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment 【发布时间】:2020-12-15 08:12:31 【问题描述】:

我正在尝试将 BigQuery 数据集连接到 Databrick 并使用 Pyspark 运行脚本。

我做过的程序:

我将 BigQuery Json API 修补到 dbfs 中的 databrick 以进行连接访问。

然后我在集群库中添加了 spark-bigquery-latest.jar 并运行了我的脚本。

当我运行这个脚本时,我没有遇到任何错误。

from pyspark.sql import SparkSession
spark = (
    SparkSession.builder
    .appName('bq')
    .master('local[4]')
    .config('parentProject', 'google-project-ID')
    .config('spark.jars', 'dbfs:/FileStore/jars/jarlocation.jar') \
    .getOrCreate()
)
df = spark.read.format("bigquery").option("credentialsFile", "/dbfs/FileStore/tables/bigqueryapi.json") \
  .option("parentProject", "google-project-ID") \
  .option("project", "Dataset-Name") \
  .option("table","dataset.schema.tablename") \
  .load()
df.show()

但是我没有调用该架构中的单个表,而是尝试使用如下查询调用它下的所有表:

from pyspark.sql import SparkSession
from google.cloud import bigquery
spark = (
    SparkSession.builder
    .appName('bq')
    .master('local[4]')
    .config('parentProject', 'google-project-ID')
    .config('spark.jars', 'dbfs:/FileStore/jars/jarlocation.jar') \
    .getOrCreate()
)
client = bigquery.Client()
table_list = 'dataset.schema'
tables = client.list_tables(table_list)

for table in tables:
   tlist = tlist.append(table)

for i in tlist:
   sql_query = """select * from `dataset.schema.' + i +'`"""
   df = spark.read.format("bigquery").option("credentialsFile", "/dbfs/FileStore/tables/bigqueryapi.json") \
  .option("parentProject", "google-project-ID") \
  .option("project", "Dataset-Name") \
  .option("query", sql_query).load()
  df.show()

这个脚本:

from pyspark.sql import SparkSession
spark = (
    SparkSession.builder
    .appName('bq')
    .master('local[4]')
    .config('parentProject', 'google-project-ID')
    .config('spark.jars', 'dbfs:/FileStore/jars/jarlocation.jar') \
    .getOrCreate()
)
sql_query = """select * from `dataset.schema.tablename`"""
df = spark.read.format("bigquery").option("credentialsFile", "/dbfs/FileStore/tables/bigqueryapi.json") \
  .option("parentProject", "google-project-ID") \
  .option("project", "Dataset-Name") \
  .option("query", sql_query).load()
  df.show()

我收到这个不寻常的错误:

IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment.  Please set a project ID using the builder.
---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<command-131090852> in <module>
     35   .option("parentProject", "google-project-ID") \
     36   .option("project", "Dataset-Name") \
---> 37   .option("query", sql_query).load()
     38 #df.show()
     39 

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
    182             return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
    183         else:
--> 184             return self._df(self._jreader.load())
    185 
    186     @since(1.4)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1303         answer = self.gateway_client.send_command(command)
   1304         return_value = get_return_value(
-> 1305             answer, self.gateway_client, self.target_id, self.name)
   1306 
   1307         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    131                 # Hide where the exception came from that shows a non-Pythonic
    132                 # JVM exception message.
--> 133                 raise_from(converted)
    134             else:
    135                 raise

/databricks/spark/python/pyspark/sql/utils.py in raise_from(e)

IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment.  Please set a project ID using the builder.

当我将它作为表调用时,它确实可以识别我的项目 ID,但是当我将它作为查询运行时,我得到了这个错误。

我试图弄清楚并浏览了许多网站以寻求答案,但无法得到明确的答案。

非常感谢您的帮助...在此先感谢...

【问题讨论】:

【参考方案1】:

你能避免使用查询而只使用表格选项吗?

from pyspark.sql import SparkSession
from google.cloud import bigquery
spark = (
    SparkSession.builder
    .appName('bq')
    .master('local[4]')
    .config('parentProject', 'google-project-ID')
    .config('spark.jars', 'dbfs:/FileStore/jars/jarlocation.jar') \
    .getOrCreate()
)
client = bigquery.Client()
table_list = 'dataset.schema'
tables = client.list_tables(table_list)

for table in tables:
   tlist = tlist.append(table)

for i in tlist:
    df = spark.read.format("bigquery").option("credentialsFile", "/dbfs/FileStore/tables/bigqueryapi.json") \
      .option("parentProject", "google-project-ID") \
      .option("project", "Dataset-Name") \
      .option("table","dataset.schema." + str(i)) \
      .load()
    df.show()

【讨论】:

如果我需要在读取数据框时使用查询选项通过数据框从 BigQuery 获得的表中取消嵌套几个嵌套列怎么办!? @NaveenB 请再问一个问题。这离当前问题太远了,这里没有足够的空间来回答它。【参考方案2】:

在我的情况下,我遇到了同样的异常,但因为我没有指定配置值 parentProject,这是我要连接到的 BigQuery 项目 ID

【讨论】:

以上是关于IllegalArgumentException:此服务需要项目 ID,但无法从构建器或环境中确定的主要内容,如果未能解决你的问题,请参考以下文章

IllegalArgumentException:无效的列纬度

Retrofit-IllegalArgumentException:意外的 url

引起:java.lang.IllegalArgumentException:属性'driverClassName'不能为空

IllegalArgumentException:接收方未注册

IllegalArgumentException 介绍

java.lang.IllegalArgumentException:基本 URI 不能为空