在 Spark 中使用相应的列名（有条件地）更改数据框

Posted 2023-04-15

技术标签:

【中文标题】在 Spark 中使用相应的列名（有条件地）更改数据框【英文标题】：Change Data frame with the respective column names (conditionally) in Spark 【发布时间】：2016-11-07 10:26:11 【问题描述】：

我有一个名为 products 的数据框，如下所示：

Credit | Savings | Premium
1        0         1
0        1         1
1        1         0

所有列值都是字符串

我想把它转换成

Credit | Savings | Premium
Credit   0         Premium
0        Savings   Premium
Credit   Savings   0

在 Spark 中？

我在 Zeppelin 中使用 Spark 1.6.2。

【问题讨论】：

正如@RamPrasad 指出的那样尝试了val udf1 = udf (presence: String) => if(presence == "1") "Credit" else "0" 。有效！现在我试图通过像这样

val udf1 = udf (presence: String, product:String) =&gt;    if(presence == "1") product else "0"  df.withColumn("Credit", udf1(sanderProdSmall("Credit"),"Credit"))

向udf函数传递一个附加参数来为所有列创建一个udf 发现错误：String("ind_cco_fin_ult2") required: org.apache.spark.sql.Column跨度> 【参考方案1】：

我假设Credit , Savings , Premium 是字符串列

import org.apache.spark.sql.functions._ // for `when`

df : DataFrame = ..... 

df.replace("Credit", ImmutableMap.of("1", "Credit"))
.replace("Savings ", ImmutableMap.of("1", "Savings "))
.replace("Premium", ImmutableMap.of("1", "Premium"));

否则你也可以这样做......

df.withColumn("Credit", udf1)
.withColumn("Savings ", udf2)
.withColumn("Premium", udf3)

其中 udf1、2、3 是 spark udf，用于将“1”转换为对应的列名...

而不是 udf。你也可以使用when(cond, val).otherwise(val) 语法。

 df.withColumn("Credit", when (df("Credit") === "1", lit("Credit")).otherwise(0)
 .withColumn("Savings", when (df("Savings") === "1", lit("Savings ")).otherwise(0)
.withColumn("Premium", when (df("Premium") === "1", "Premium").otherwise(0)

就是这样......祝你好运:-)

【讨论】：

嘿@RamPrasad 非常感谢您在 udfs 中指出。试过这个codeval udf1 = udf (presence: String) => if(presence == "1") "Credit" else "0" code 我试图通过像这样

val udf1 = udf (presence: String, product:String) =&gt;    if(presence == "1") product else "0" . But when I try to call this udf by running df.withColumn("Credit", udf1(sanderProdSmall("Credit"),"Credit"))

向 udf 函数传递一个附加参数来为所有列创建一个 udf 发现错误：String("ind_cco_fin_ult2") required: org.apache。 spark.sql.Column yes udf 1 2 3 只是示例。您可以创建单个 udf。您使用的任何字符串都可以使用，例如 lit("Credit"), lit("Savings"), lit("Premium") lit 是 Columntype 的字符串列，因为它的预期列类型。太棒了！工作！谢谢你拉姆@RamPrasad !! 在其他情况下也更新了我的答案。请检查！

以上是关于在 Spark 中使用相应的列名（有条件地）更改数据框的主要内容，如果未能解决你的问题，请参考以下文章