Spark 从另一个表更新 Delta 中的多个列
Posted
技术标签:
【中文标题】Spark 从另一个表更新 Delta 中的多个列【英文标题】:Spark Update Multiple Columns in Delta from another table 【发布时间】:2020-05-26 05:56:22 【问题描述】:我正在尝试根据从另一个增量表中获取的值更新一个增量表中的多个列。下面的更新 sql 在 Oracle 中有效,但在 Spark Delta 中无效,您能帮忙吗?
deptDf = sqlContext.createDataFrame(
[(10, "IT", "Seattle"), (20, "Accounting", "Renton"), (30, "Finance", "Bellevue"), (40, "Manufacturing", "Tacoma"), (50, "Inventory", "Bothell")],
("dno", "dname", "location"))
updateddeptlocDf = sqlContext.createDataFrame(
[(20, "Accounting and Finance", "SODO"), (10, "Technology", "SODO")], ("dno", "updated_name", "updated_location"))
deptDf.write.format("delta").mode("Overwrite").save("/mnt/delta/dept")
updateddeptlocDf.write.mode("Overwrite").format("delta").save("/mnt/delta/updatedDept")
spark.sql("DROP TABLE IF EXISTS deptdelta")
spark.sql("DROP TABLE IF EXISTS updated_dept_location")
spark.sql("CREATE TABLE deptdelta USING DELTA LOCATION '/mnt/delta/dept'")
spark.sql("CREATE TABLE updated_dept_location USING DELTA LOCATION '/mnt/delta/updatedDept'")
我试图发出的更新语句失败是:
UPDATE deptdelta d
SET (d.dname, d.location) = (SELECT ud.updated_name, ud.updated_location FROM updated_dept_location u WHERE d.dno = u.dno )
WHERE EXISTS (SELECT 'a' from updated_dept_location u1 WHERE d.dno = u1.dno )
错误:
SQL 语句错误:ParseException: 不匹配的输入 ',' 期望 EQ(line 2, pos 11)
== SQL == 更新 deptdelta d SET d.dname, d.location = (SELECT ud.updated_name, ud.updated_location FROM updated_dept_location u WHERE d.dno = u.dno) ------------^^^ WHERE EXISTS (SELECT 'a' from updated_dept_location u1 WHERE d.dno = u1.dno)
【问题讨论】:
【参考方案1】:MERGE 成功了
MERGE INTO deptdelta AS maindept
USING updated_dept_location AS upddept
ON upddept.dno = maindept.dno
WHEN MATCHED THEN UPDATE SET maindept.dname = upddept.updated_name, maindept.location = upddept.updated_location
【讨论】:
以上是关于Spark 从另一个表更新 Delta 中的多个列的主要内容,如果未能解决你的问题,请参考以下文章
如何在火花中连接多个列,同时从另一个表中连接列名(每行不同)