如何在scala中获取分层数组的最终元素并在其上应用聚合函数?

Posted

技术标签:

【中文标题】如何在scala中获取分层数组的最终元素并在其上应用聚合函数?【英文标题】:How to get the final element of a hierarchical array in scala and apply aggregate functions on it? 【发布时间】:2022-01-22 16:53:02 【问题描述】:

我在数据框中有一个分层数组:

customerId  accounts
IND0002 [["IND0002","ACC0155","323"],["IND0002","ACC0262","60"]]
IND0003 [["IND0003","ACC0235","631"],["IND0003","ACC0486","400"],["IND0003","ACC0540","53"]]
IND0004 [["IND0004","ACC0116","965"]]

我需要从数组中的每个列表中提取最后一个元素的最低元素 例如:从第一行我应该得到323,60,第二行应该得到631,400,53

我尝试使用explode函数,但它只提取第一个元素

customerId  accounts    col
IND0002 [["IND0002","ACC0155","323"],["IND0002","ACC0262","60"]]    ["IND0002","ACC0155","323"]
IND0002 [["IND0002","ACC0155","323"],["IND0002","ACC0262","60"]]    ["IND0002","ACC0262","60"]
IND0003 [["IND0003","ACC0235","631"],["IND0003","ACC0486","400"],["IND0003","ACC0540","53"]]    ["IND0003","ACC0235","631"]
IND0003 [["IND0003","ACC0235","631"],["IND0003","ACC0486","400"],["IND0003","ACC0540","53"]]    ["IND0003","ACC0486","400"]
IND0003 [["IND0003","ACC0235","631"],["IND0003","ACC0486","400"],["IND0003","ACC0540","53"]]    ["IND0003","ACC0540","53"]
IND0004 [["IND0004","ACC0116","965"]]   ["IND0004","ACC0116","965"]

val newDF1 = CustomerAccountOutput.withColumn("accounts", $"accounts"(size($"accounts")).minus(1))

CustomerAccountOutput.select($"customerID",explode($"accounts"))

【问题讨论】:

似乎只使用.map 并且一些数组操作是可以的。这有什么顾虑吗? 【参考方案1】:

transform 函数与 lambda 函数一起使用,其中对于每个子数组,您可以使用带有索引 -1element_at 获得最后一个元素:

val newDF1 = CustomerAccountOutput.withColumn(
  "new_col", 
  expr("transform(accounts, x -> element_at(x, -1))")
)

newDF1.show(false)

//+----------+--------------------+--------------+
//|customerId|            accounts|       new_col|
//+----------+--------------------+--------------+
//|   IND0002|[[IND0002, ACC015...|     [323, 60]|
//|   IND0003|[[IND0003, ACC023...|[631, 400, 53]|
//|   IND0004|[[IND0004, ACC011...|         [965]|
//+----------+--------------------+--------------+

【讨论】:

【参考方案2】:

使用 slice 和 withcolumn 得到了满意的结果

【讨论】:

以上是关于如何在scala中获取分层数组的最终元素并在其上应用聚合函数?的主要内容,如果未能解决你的问题,请参考以下文章

交换2个html元素并在其上保留事件侦听器

如何在字符串中打印文字花括号字符并在其上使用 .format?

如何创建空/空白 UIImage 或 CGImage 并在其上操作像素(Xcode)?

如何创建一个空垫并在其上画线?

如何使用从图像中获得的 SIFT 描述符关键点并在其上运行 RANSAC(estimateGeometricTransform)?

如何在 MapView 上绘制多边形,填充它,并在其上放置一个 onTouch 事件