如何在scala中获取分层数组的最终元素并在其上应用聚合函数?
Posted
技术标签:
【中文标题】如何在scala中获取分层数组的最终元素并在其上应用聚合函数?【英文标题】:How to get the final element of a hierarchical array in scala and apply aggregate functions on it? 【发布时间】:2022-01-22 16:53:02 【问题描述】:我在数据框中有一个分层数组:
customerId accounts
IND0002 [["IND0002","ACC0155","323"],["IND0002","ACC0262","60"]]
IND0003 [["IND0003","ACC0235","631"],["IND0003","ACC0486","400"],["IND0003","ACC0540","53"]]
IND0004 [["IND0004","ACC0116","965"]]
我需要从数组中的每个列表中提取最后一个元素的最低元素
例如:从第一行我应该得到323,60
,第二行应该得到631,400,53
我尝试使用explode函数,但它只提取第一个元素
customerId accounts col
IND0002 [["IND0002","ACC0155","323"],["IND0002","ACC0262","60"]] ["IND0002","ACC0155","323"]
IND0002 [["IND0002","ACC0155","323"],["IND0002","ACC0262","60"]] ["IND0002","ACC0262","60"]
IND0003 [["IND0003","ACC0235","631"],["IND0003","ACC0486","400"],["IND0003","ACC0540","53"]] ["IND0003","ACC0235","631"]
IND0003 [["IND0003","ACC0235","631"],["IND0003","ACC0486","400"],["IND0003","ACC0540","53"]] ["IND0003","ACC0486","400"]
IND0003 [["IND0003","ACC0235","631"],["IND0003","ACC0486","400"],["IND0003","ACC0540","53"]] ["IND0003","ACC0540","53"]
IND0004 [["IND0004","ACC0116","965"]] ["IND0004","ACC0116","965"]
val newDF1 = CustomerAccountOutput.withColumn("accounts", $"accounts"(size($"accounts")).minus(1))
CustomerAccountOutput.select($"customerID",explode($"accounts"))
【问题讨论】:
似乎只使用.map
并且一些数组操作是可以的。这有什么顾虑吗?
【参考方案1】:
将 transform
函数与 lambda 函数一起使用,其中对于每个子数组,您可以使用带有索引 -1
的 element_at
获得最后一个元素:
val newDF1 = CustomerAccountOutput.withColumn(
"new_col",
expr("transform(accounts, x -> element_at(x, -1))")
)
newDF1.show(false)
//+----------+--------------------+--------------+
//|customerId| accounts| new_col|
//+----------+--------------------+--------------+
//| IND0002|[[IND0002, ACC015...| [323, 60]|
//| IND0003|[[IND0003, ACC023...|[631, 400, 53]|
//| IND0004|[[IND0004, ACC011...| [965]|
//+----------+--------------------+--------------+
【讨论】:
【参考方案2】:使用 slice 和 withcolumn 得到了满意的结果
【讨论】:
以上是关于如何在scala中获取分层数组的最终元素并在其上应用聚合函数?的主要内容,如果未能解决你的问题,请参考以下文章
如何在字符串中打印文字花括号字符并在其上使用 .format?
如何创建空/空白 UIImage 或 CGImage 并在其上操作像素(Xcode)?
如何使用从图像中获得的 SIFT 描述符关键点并在其上运行 RANSAC(estimateGeometricTransform)?