重命名架构中的嵌套 Json 数据
Posted
技术标签:
【中文标题】重命名架构中的嵌套 Json 数据【英文标题】:Rename Nested Json data in schema 【发布时间】:2019-07-01 16:20:37 【问题描述】:大家好,我是 Spark/Scala 的新手,我想重命名一些嵌套的 JSON 字段,因为当我进行横向视图时,它会失败,因为有多个同名的 JSON 字段。
我想重命名 EmployeeAddr 和 EmployeePhone 中的 EffDate 和 ExpDate 列。
我已经尝试过 withColumnRenamed 和 withColumn 函数,但由于某种原因,这两个函数都不适合我。
Code to load into dataframe:
val Employee= spark.read.format(Employeefile_type).option("header", "true").option("inferSchema","true").load(file_loction)
root
|-- BirthDate: string (nullable = true)
|-- EmployeeId: string (nullable = true)
|-- EmployeeAddr: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- AddrTypeName: string (nullable = true)
| | |-- City: string (nullable = true)
| | |-- CtryCode: string (nullable = true)
| | |-- EffDate: string (nullable = true)
| | |-- ExpDate: string (nullable = true)
| | |-- PostalCode: string (nullable = true)
| | |-- Province: string (nullable = true)
| | |-- Street1: string (nullable = true)
| | |-- Street2: string (nullable = true)
|-- EmployeeEmail: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- CrewEmailAddr: string (nullable = true)
| | |-- EmailType: string (nullable = true)
|-- EmployeeEmerContact: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Addr: string (nullable = true)
| | |-- FirstName: string (nullable = true)
| | |-- LastName: string (nullable = true)
| | |-- PrimaryPhone: string (nullable = true)
| | |-- Relatnshp: string (nullable = true)
| | |-- Title: string (nullable = true)
|-- EmployeeEmplymntStatus: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- EmplymntStatusCode: string (nullable = true)
| | |-- EmplymntStatusReason: string (nullable = true)
| | |-- EndDate: string (nullable = true)
| | |-- StartDate: string (nullable = true)
|-- EmployeePhone: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- EmployeePhoneNumber: string (nullable = true)
| | |-- EffDate: string (nullable = true)
| | |-- ExpDate: string (nullable = true)
| | |-- PhoneType: string (nullable = true)
【问题讨论】:
【参考方案1】:您可以应用此处描述的解决方案:
How to rename fields in an DataFrame corresponding to nested JSON
它执行以下操作,替换 DataFrame 架构(用新架构重新创建 DataFrame。
【讨论】:
以上是关于重命名架构中的嵌套 Json 数据的主要内容,如果未能解决你的问题,请参考以下文章