我想使用三列进行计算并生成显示所有三个值的单列
Posted
技术标签:
【中文标题】我想使用三列进行计算并生成显示所有三个值的单列【英文标题】:I want to calculate using three columns and produce single column with showing all three values 【发布时间】:2018-11-21 15:54:29 【问题描述】:我在 spark databrick 的数据框中加载一个文件
spark.sql("""select A,X,Y,Z from fruits""")
A X Y Z
1E5 1.000 0.000 0.000
1U2 2.000 5.000 0.000
5G6 3.000 0.000 10.000
我需要输出为
A D
1E5 X 1
1U2 X 2, Y 5
5G6 X 3, Z 10
我能够找到解决方案。
【问题讨论】:
您能否添加更多详细信息,您正在尝试做什么,哪些没有奏效? 【参考方案1】:每个列名可以用值连接,然后所有值可以连接在一列,用逗号分隔:
// data
val df = Seq(
("1E5", 1.000, 0.000, 0.000),
("1U2", 2.000, 5.000, 0.000),
("5G6", 3.000, 0.000, 10.000))
.toDF("A", "X", "Y", "Z")
// action
val columnsToConcat = List("X", "Y", "Z")
val columnNameValueList = columnsToConcat.map(c =>
when(col(c) =!= 0, concat(lit(c), lit(" "), col(c).cast(IntegerType)))
.otherwise("")
)
val valuesJoinedByComaColumn = columnNameValueList.reduce((a, b) =>
when(org.apache.spark.sql.functions.length(a) =!= 0 && org.apache.spark.sql.functions.length(b) =!= 0, concat(a, lit(", "), b))
.otherwise(concat(a, b))
)
val result = df.withColumn("D", valuesJoinedByComaColumn)
.drop(columnsToConcat: _*)
输出:
+---+---------+
|A |D |
+---+---------+
|1E5|X 1 |
|1U2|X 2, Y 5 |
|5G6|X 3, Z 10|
+---+---------+
与stack0114106提出的解决方案类似,但看起来更明确。
【讨论】:
嘿..感谢您增强它.. OP 提到它对他不起作用..不确定是什么问题..【参考方案2】:看看这个:
scala> val df = Seq(("1E5",1.000,0.000,0.000),("1U2",2.000,5.000,0.000),("5G6",3.000,0.000,10.000)).toDF("A","X","Y","Z")
df: org.apache.spark.sql.DataFrame = [A: string, X: double ... 2 more fields]
scala> df.show()
+---+---+---+----+
| A| X| Y| Z|
+---+---+---+----+
|1E5|1.0|0.0| 0.0|
|1U2|2.0|5.0| 0.0|
|5G6|3.0|0.0|10.0|
+---+---+---+----+
scala> val newcol = df.columns.drop(1).map( x=> when(col(x)===0,lit("")).otherwise(concat(lit(x),lit(" "),col(x).cast("int").cast("string"))) ).reduce( (x,y) => concat(x,lit(", "),y) )
newcol: org.apache.spark.sql.Column = concat(concat(CASE WHEN (X = 0) THEN ELSE concat(X, , CAST(CAST(X AS INT) AS STRING)) END, , , CASE WHEN (Y = 0) THEN ELSE concat(Y, , CAST(CAST(Y AS INT) AS STRING)) END), , , CASE WHEN (Z = 0) THEN ELSE concat(Z, , CAST(CAST(Z AS INT) AS STRING)) END)
scala> df.withColumn("D",newcol).withColumn("D",regexp_replace(regexp_replace('D,", ,",","),", $", "")).drop("X","Y","Z").show(false)
+---+---------+
|A |D |
+---+---------+
|1E5|X 1 |
|1U2|X 2, Y 5 |
|5G6|X 3, Z 10|
+---+---------+
scala>
【讨论】:
我收到error: value withColumn is not a member of org.apache.spark.sql.Column newcol.withColumn("D",newcol).withColumn("D",regexp_replace(regexp_replace('Inventory_Status,", ,",","),", $", "")).drop("X","Y","Z").show(false)
您使用的是哪个 spark 版本?以上是关于我想使用三列进行计算并生成显示所有三个值的单列的主要内容,如果未能解决你的问题,请参考以下文章
bartender 怎么设置一行三列同时打印三个不同数据的标签?
编写 SQL 查询,选择除同时在三列中具有指定值的行之外的所有行
如何用 seaborn 绘制 pandas 三列(用 group by 计算)