R中数据集中具有最小值和最大值的列名
Posted
技术标签:
【中文标题】R中数据集中具有最小值和最大值的列名【英文标题】:Column name with the min and max values in a dataset in R 【发布时间】:2021-08-02 03:47:51 【问题描述】:我有这个数据集:
Year January February March April May June July August
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018 45 51 63 61 79 85 88 85
2 2017 51 60 65 69 75 82 86 84
3 2016 47 55 61 68 72 84 87 85
... with 20 more rows
我想得到每行的最小值和最大值对应的月份,以及最大值和最小值之间的差异。这是我的最小值和最大值代码,
x <- colnames(data)[apply(data[,c(2:9)],1,which.max)]
y <- colnames(data)[apply(data[,c(2:9)],1,which.min)]
data$MaxMonth <- x
data$MinMonth <- y
但是,它给了我 Year 作为某些 which.min 函数的输出。
Year January February March April May June July August MaxMonth MinMonth Diff
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018 45 51 63 61 79 85 88 85 July January 43
2 2017 51 60 65 69 75 82 86 84 July Year 35
3 2016 47 55 61 68 72 84 87 85 July Year 40
... with 20 more rows
【问题讨论】:
你也应该在colnames
中使用[,c(2:9)]
【参考方案1】:
无需执行 3 个应用功能。你可以这样做:
nms <- names(df)[-1]
n <- seq(nrow(df))
maxMonth = max.col(df[-1])
minMonth = max.col(-df[-1])
diff <- df[-1][cbind(n, maxMonth)] - df[-1][cbind(n, minMonth)]
cbind(df, maxMonth = nms[maxMonth], minMonth = nms[minMonth], diff)
Year January February March April May June July August maxMonth minMonth diff
1 2018 45 51 63 61 79 85 88 85 July January 43
2 2017 51 60 65 69 75 82 86 84 July January 35
3 2016 47 55 61 68 72 84 87 85 July January 40
【讨论】:
【参考方案2】:我认为对您帖子的评论突出了问题所在
你应该写
x <- colnames(data)[2:9][apply(data[,c(2:9)],1,which.max)]
y <- colnames(data)[2:9][apply(data[,c(2:9)],1,which.min)]
data$MaxMonth <- x
data$MinMonth <- y
这样会更好吗?
【讨论】:
好的,谢谢。我注意到这是我的错误。【参考方案3】:我们可以用pivot_longer
重塑长格式,按'Year'分组,得到'value'的max/min
对应的列名(用which.max/which.min
),然后与原始数据连接
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -1) %>%
group_by(Year) %>%
summarise(maxMonth = name[which.max(value)],
minMonth = name[which.min(value)]) %>%
left_join(df, .)
【讨论】:
谢谢。当我有大数据集时,这是最清楚的。 如果我在 summarise 中继续上面的代码,我有: summarise(maxMonth = name[which.max(value)], minMonth = name[which.min(value)], maxValue = max(value), minValue= min(value), Diff = maxValue- minValue)。它将列出 Diff 下的差异,我如何提取最大值和最小值之间最大差异的年份,在这种情况下是 2018 的值 42 @JakeParker 你可以使用%>% slice_max(n = 1, order_by = Diff)
来返回行,或者如果你只想要年份那么%>% summarise(year = year[which.max(Diff)]) %>% pull(year)
【参考方案4】:
library(tidyverse)
df %>%
mutate(max_month = pmap(across(January:August), ~ names(c(...)[which.max(c(...))])),
min_month = pmap(across(January:August), ~ names(c(...)[which.min(c(...))]))
) %>%
unnest(cols = c(max_month, min_month)) %>%
rowwise() %>%
mutate(Diff = max(c_across(January:August)) - min(c_across(January:August)))
输出:
Year January February March April May June July August max_month min_month Diff
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
1 2018 45 51 63 61 79 85 88 85 July January 43
2 2017 51 60 65 69 75 82 86 84 July January 35
3 2016 47 55 61 68 72 84 87 85 July January 40
【讨论】:
以上是关于R中数据集中具有最小值和最大值的列名的主要内容,如果未能解决你的问题,请参考以下文章
如何通过使用 dplyr 或其他包在 R 中具有最小值和最大值的查询来实现组?