如何将字符串 dict 转换为 pyspark 数据框?
Posted
技术标签:
【中文标题】如何将字符串 dict 转换为 pyspark 数据框?【英文标题】:How to convert string dict to pyspark dataframe? 【发布时间】:2021-05-14 23:24:26 【问题描述】:
"input":[("James", "Sales", 3000),
("Michael", "Sales", 4600),
("Robert", "Sales", 4100),
("Maria", "Finance", 3000),
("James", "Sales", 3000),
("Scott", "Finance", 3300),
("Jen", "Finance", 3900),
("Jeff", "Marketing", 3000),
("Kumar", "Marketing", 2000),
("Saif", "Sales", 4100)],
"deptColumns" : ["employee_name", "department", "salary"]
【问题讨论】:
【参考方案1】:假设数据是一个字符串,你可以eval
它并使用spark.createDataFrame
将它加载到一个spark数据帧中:
data = """
"input":[("James", "Sales", 3000),
("Michael", "Sales", 4600),
("Robert", "Sales", 4100),
("Maria", "Finance", 3000),
("James", "Sales", 3000),
("Scott", "Finance", 3300),
("Jen", "Finance", 3900),
("Jeff", "Marketing", 3000),
("Kumar", "Marketing", 2000),
("Saif", "Sales", 4100)],
"deptColumns" : ["employee_name", "department", "salary"]
"""
import ast
data = ast.literal_eval(data)
df = spark.createDataFrame(data['input'], data['deptColumns'])
df.show()
+-------------+----------+------+
|employee_name|department|salary|
+-------------+----------+------+
| James| Sales| 3000|
| Michael| Sales| 4600|
| Robert| Sales| 4100|
| Maria| Finance| 3000|
| James| Sales| 3000|
| Scott| Finance| 3300|
| Jen| Finance| 3900|
| Jeff| Marketing| 3000|
| Kumar| Marketing| 2000|
| Saif| Sales| 4100|
+-------------+----------+------+
【讨论】:
以上是关于如何将字符串 dict 转换为 pyspark 数据框?的主要内容,如果未能解决你的问题,请参考以下文章
如何在 PySpark 1.6 中将 DataFrame 列从字符串转换为浮点/双精度?
在 pyspark 的 StructStreaming 中;如何将 DataFrame 中的每一行(json 格式的字符串)转换为多列