当我在蜂巢中写入镶木地板表时出现 Pyspark 错误

Posted 2023-04-15

技术标签:

【中文标题】当我在蜂巢中写入镶木地板表时出现 Pyspark 错误【英文标题】：Pyspark error when i write to parquet table in hive 【发布时间】：2020-05-08 19:48:50 【问题描述】：

我用这段代码创建了一个蜂巢表：

CREATE TABLE rci_db_inventory.dev_cr_asset_trace_2 (   id STRING,   acn STRING,   source_max_date BIGINT,   col_name STRING,   source_value STRING,   type STRING,   lid STRING,   source_id STRING,   created_by STRING,   created_on STRING,   traceable STRING,   found STRING ) PARTITIONED BY (   ctl_eid STRING )  STORED AS PARQUET

所以问题是当我尝试从 pyspark df 写入此表时，代码如下：

columnar_df.withColumn("found", lit(head_bi_name)).write.format("parquet").mode("append") \
                .partitionBy("ctl_eid").saveAsTable('rci_db_inventory.dev_cr_asset_trace_2')

错误：

pyspark.sql.utils.AnalysisException: u"The format of the existing table rci_db_inventory.dev_cr_asset_trace_2 is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.;"

我使用 cloudera 本地集群。

【问题讨论】：

【参考方案1】：

如果您尝试使用 .format('hive') 会怎样？

columnar_df.withColumn("found", lit(head_bi_name)).write.format("hive").mode("append") \
               .partitionBy("ctl_eid").saveAsTable('rci_db_inventory.dev_cr_asset_trace_2')

【讨论】：

以上是关于当我在蜂巢中写入镶木地板表时出现 Pyspark 错误的主要内容，如果未能解决你的问题，请参考以下文章