如何在scala spark中将数据框的特定列与另一个列连接[重复]
Posted
技术标签:
【中文标题】如何在scala spark中将数据框的特定列与另一个列连接[重复]【英文标题】:how to join specific column of dataframe with another in scala spark [duplicate] 【发布时间】:2017-12-14 18:39:08 【问题描述】:我有四个数据框,
df1 为,
name city
--------------------------------
kum chennai
kamesh bangalore
df2 为,
name street
-------------------------------
kum 2nd str
kamesh 10th str
我需要添加城市和街道的名称。 输出数据帧,例如, df3 =
name street city
-----------------------------
kum 2nd str Chennai
kamesh 10th str bangalore.
如何使用 Scala 转换为 df3
【问题讨论】:
【参考方案1】:按照以下方式加入他们
val df3 = df1.join(df2, Seq("name"))
默认是一个内连接,你可以定义join
类型为
val df3 = df1.join(df2, Seq("name"), "inner")
你的输出应该是
+------+---------+--------+
|name |city |street |
+------+---------+--------+
|kum |chennai |2nd str |
|kamesh|bangalore|10th str|
+------+---------+--------+
【讨论】:
【参考方案2】:你可以使用这个:
val df3 = df1.join(df2, df1("name").equalTo(df2("name")))
但它会显示两次加入键
val df4 = df1.join(df2, Seq("name"), "inner")
这只会显示一次加入密钥
如下代码:
在此输入代码
import spark.implicits._
val df1: DataFrame = Seq(("kum","chennai"),("kamesh","bangalore")).toDF("name","city")
val df2: DataFrame = Seq(("kum","2nd str"),("kamesh","10th str")).toDF("name","street")
val df3 = df1.join(df2, df1("name").equalTo(df2("name")))
df3.show()
val df4 = df1.join(df2, Seq("name"), "inner")
df4.show()
结果:
+------+---------+------+--------+
| name| city| name| street|
+------+---------+------+--------+
| kum| chennai| kum| 2nd str|
|kamesh|bangalore|kamesh|10th str|
+------+---------+------+--------+
+------+---------+--------+
| name| city| street|
+------+---------+--------+
| kum| chennai| 2nd str|
|kamesh|bangalore|10th str|
+------+---------+--------+
【讨论】:
嗨,如果我在数据帧 2 中有不同的列,例如 val df2:DataFrame = Seq(("Object1","OBJECT_TYPE1"),("OBJECT2","OBJECT_TYPE2")).toDF( "OBJECT","TYPE"),并想添加到 df1,结果显示空行。如何加入。以上是关于如何在scala spark中将数据框的特定列与另一个列连接[重复]的主要内容,如果未能解决你的问题,请参考以下文章