在 Databricks 中解析嵌套的 XML

Posted 2023-04-15

技术标签:

【中文标题】在 Databricks 中解析嵌套的 XML【英文标题】：Parsing nested XML in Databricks 【发布时间】：2021-04-18 14:46:59 【问题描述】：

我正在尝试 p

我正在尝试将 XML 读入数据框并尝试使用如下所示的爆炸来展平数据。

val df = spark.read.format("xml").option("rowTag","on").option("inferschema","true").load("filepath") val parsxml=df .withColumn("exploded_element", explode(("prgSvc.element"))).

我收到以下错误。

command-5246708674960:4: error: type mismatch;
found   : String("prgSvc.element")
required: org.apache.spark.sql.Column
.withColumn("exploded_element", explode(("prgSvc.element")))**

Before reading the XML into the data frame, I also tried to manually assign a custom schema and read the XML file. But the output is all NULL. Could you please let me know if my approach is valid and how to resolve this issue and achieve the output.
Thank you.

【问题讨论】：

【参考方案1】：

使用这个

import spark.implicits._

val parsxml= df .withColumn("exploded_element", explode($"prgSvc.element"))

【讨论】：

感谢您的回复。上面的代码给了我下面的错误， org.apache.spark.sql.AnalysisException: cannot resolve 'prgSvc.element' given input columns: [prgSvcs];;当我将爆炸列名称更改为 prgSvcs 时，错误是 org.apache.spark.sql.AnalysisException: No such struct field element in prgSvc;在这种情况下，读取 XML 文件时我的 rowTag 应该是什么？行标签应该是根。然后在使用上面的列名之前打印架构

以上是关于在 Databricks 中解析嵌套的 XML的主要内容，如果未能解决你的问题，请参考以下文章