AnalysisException:u“除了只能在具有兼容列类型的表上执行

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了AnalysisException:u“除了只能在具有兼容列类型的表上执行相关的知识,希望对你有一定的参考价值。

我正在删除实际的列名,因为我不应该分享那些但是她是错误的一瞥

AnalysisException: u"Except can only be performed on tables with the compatible column types. 
string <> boolean at the 28th column of the second table;
;
'Except false
:- Filter (cast(inactive_date#111 as string) = '3001-01-01')
:  
+- Project [... 33 more fields]
:+- Project [ ... 33 more fields]
:+- SubqueryAlias 
:+-Relation[... 33 more fields] parquet

+- Project [... 33 more fields]
 +- Join Inner, (Key#275 = entry#26)
:- Filter (cast(inactive_date#283 as string) = '3001-01-01')
:  
+- Project [... 33 more fields]
:  
+- Project [... 33 more fields]
 : +- SubqueryAlias  +- Relation[,... 33 more fields] parquet
      
+- Deduplicate [entry#26]
 +- Project [entry#26]
+- Project [... 13 more fields]
              
+- Project [... 13 more fields]
  +- SubqueryAlias +- Relation[] parquet
"

我的代码看起来像这样

#old dataframe   (consider it as History )
#daily dataframe ( Consider it as daily  )

#Filtering the Active records based on condition

Active_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] == '3001-01-01')
Inactive_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] != '3001-01-01')

#Joining active old records with the matching active records in daily dataframe based on KeyColumnA 

left = Active_old_filtered_records
right = Active_new_daily_dataframe.select("keyColumnA").distinct()

Matching_Active_daily_old_dataframe = left.join(right, ["keyColumnA"])
Non_matching_active_daily_old_dateframe = Active_old_filtered_records.**subtract**(Matching_Active_daily_old_dataframe)

注意:这里每日数据帧和旧数据框架具有完全相同的模式,但我得到分析异常。有人可以帮助这方面谢谢。

答案

最后,我能够使用以下代码解决此问题

#old dataframe   (consider it as History )
#daily dataframe ( Consider it as daily  )

cols = Active_old_filtered_records.columns

#Filtering the Active records based on condition

Active_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] == '3001-01-01')
Inactive_old_filtered_records= old_history_dataframe.filter(old_history_dataframe["inactive_date"] != '3001-01-01')

#Joining active old records with the matching active records in daily dataframe based on KeyColumnA 

left = Active_old_filtered_records
right = Active_new_daily_dataframe.select("keyColumnA").distinct()

Matching_Active_daily_old_dataframe = left.join(right, ["keyColumnA"]).select(cols)

Non_matching_active_daily_old_dateframe = Active_old_filtered_records.subtract(Matching_Active_daily_old_dataframe)

使用除起始位置之外的任何位置的列连接两个数据帧会更改结果数据帧中列的顺序。所以保持一个cols变量,并按正确的顺序选择相同的列,以确保生成的步骤正常工作:D

最后我能够解决问题。

以上是关于AnalysisException:u“除了只能在具有兼容列类型的表上执行的主要内容,如果未能解决你的问题,请参考以下文章

如何正确处理 spark.sql.AnalysisException

AnalysisException:无法解析给定的输入列:

SQL 语句中的 Databricks 错误:AnalysisException:无法解析 '``' 给定的输入列:

PySpark AnalysisException:无法解析列名

AnalysisException,pyspark 无法解析数据框查询中的变量

AnalysisException:必须使用 writeStream.start() 执行带有流源的查询