如何复制一个数据框的值为空的字段的列名并创建另一个

Posted

技术标签:

【中文标题】如何复制一个数据框的值为空的字段的列名并创建另一个【英文标题】:How to copy column names of fields whose values are null of one dataframe and create another 【发布时间】:2019-12-26 06:09:08 【问题描述】:

我在 java 1.8 中使用 spark-sql-2.4.1v。

我有一个类似下面的数据

val companyDf = Seq(
  (101,"2018-12-31","700.0","300.0","200.0","400.0","500.0","600.0","900.0","800.0","100.0","1100.0"),
  (102,"2018-12-31","700.0","300.0","200.0","400.0","500.0","600.0","900.0","800.0","100.0",null),
  (103,"2018-12-31",null,"300.0","200.0","400.0","500.0","600.0","900.0","800.0","100.0","1100.0"),
  (104,"2018-12-31",null,"300.0","200.0","400.0","500.0","600.0","900.0","800.0","100.0",null),
).toDF("id","create_date","col_imp_1","col_imp_2","col_imp_3","col_imp_4","col_imp_5","col_imp_6","col_imp_7","col_imp_8","col_imp_9","col_imp_10")

我需要检查一些强制性列,例如 "col_imp_*" ,如果它为空/空,我需要将这些字段信息捕获到另一个数据帧中以存储在下表中

结果应该是

-------------------------------
id   | null_field_col          |
-------------------------------
102   | col_imp_10             |
-------------------------------
103   | col_imp_1              |
-------------------------------
104   | col_imp_1,col_imp_10   |
-------------------------------

如何做到这一点? 我想我可以使用“when”子句,但如何将它们添加到另一个数据名中?

【问题讨论】:

【参考方案1】:

你可以这样做:

val colsToCheck = companyDf.columns.filter(_.startsWith("col_imp"))

companyDf
  .select($"id",concat_ws(",",colsToCheck.map(c => when(col(c).isNull,lit(c))):_*).as("null_field_col"))
  .show()

给予:

+---+--------------------+
| id|      null_field_col|
+---+--------------------+
|101|                    |
|102|          col_imp_10|
|103|           col_imp_1|
|104|col_imp_1,col_imp_10|
+---+--------------------+

.where($"null_field_col"=!="")可以省略第一行

【讨论】:

以上是关于如何复制一个数据框的值为空的字段的列名并创建另一个的主要内容,如果未能解决你的问题,请参考以下文章

SQL 当表中某个字段的值为NULL。假如这个值为NULL的时候。我想当成0来做减法处理。该怎么弄?

Mysql如果某个字段值存在则更新另一个字段的值为原值+100,命令应该如何写?

sql中更新某个字段中部分空值的语句怎样写?

如何获取不为空的列名

在Jquery中怎么判断input文本框的值为空啊(最好有多种方法)?

sql如何判断字段的值是否空值