如何纠正 R 函数中的变异和过滤错误
Posted
技术标签:
【中文标题】如何纠正 R 函数中的变异和过滤错误【英文标题】:How can I correct the mutate and filter errors in my R function 【发布时间】:2021-06-26 01:20:10 【问题描述】:我有一个函数,它接受一个数据框和两个其他变量(horse 和 race_date)作为输入。 horse 和 race_date 用于过滤传递给函数的数据帧,然后应用汇总函数来计算所需的输出。当我在管道之外单独测试函数时,一切正常,但是当我尝试从 mutate 函数和管道中运行函数时,我收到以下错误消息:
Error: Problem with `mutate()` input `split_Lt`.
x Problem with `filter()` input `..1`.
x Input `..1` must be of size 1, not size 18.
i Input `..1` is `Horse == horse & NewSplit == "LT Races" & race_date < date`.
i The error occurred in group 2: split = "A BIT OF BOTH_var106_Track: CD".
i Input `split_Lt` is `getsplit_LT(splits, horse, race_date)`.
i The error occurred in group 2: split = "A BIT OF BOTH_var106_Track: CD".
函数如下:
getsplit_LT <- function(df, horse, date)
kpi <- df %>%
filter(Horse == horse & NewSplit == "LT Races" & race_date < date) %>%
group_by(split) %>%
summarise_if(is.numeric, sum) %>%
mutate(TopAvgB = ((E + 3.439) /(R+3.439 + 25.69))) %>%
select(TopAvgB)
x = if(is.data.frame(kpi) && nrow(kpi)==0)0elsekpi[[1]]
return(x)
这是我尝试运行的代码:
df <- df %>%
mutate(split_Lt = getsplit_LT(splits, horse, race_date))
这是输入数据:
structure(list(horse = c("A BIT OF BOTH", "A BIT OF BOTH", "A BIT OF BOTH",
"A BIT OF BOTH", "A BIT OF BOTH", "A BIT OF BOTH", "A BIT OF BOTH",
"A BIT OF BOTH", "A BIT OF BOTH", "A BIT OF BOTH", "A BIT OF BOTH",
"A BIT OF BOTH", "A BIT OF BOTH", "A BIT OF BOTH", "A BIT OF BOTH",
"A BIT OF BOTH", "A BIT OF BOTH", "A BIT OF BOTH"), race_date = structure(c(17802,
17906, 17941, 17969, 18006, 18062, 18091, 18183, 18183, 18226,
18244, 18286, 18454, 18502, 18546, 18581, 18601, 18664), class = "Date")), row.names = c(NA,
-18L), groups = structure(list(horse = "A BIT OF BOTH", .rows = structure(list(
1:18), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), row.names = 1L, class = c("tbl_df", "tbl", "data.frame"
), .drop = TRUE), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
))
structure(list(split = c("A BIT OF BOTH_var102B_LifeTime: Life",
"A BIT OF BOTH_var102B_LifeTime: Life", "A BIT OF BOTH_var102B_LifeTime: Life",
"A BIT OF BOTH_var102B_LifeTime: Life", "A BIT OF BOTH_var102B_LifeTime: Life",
"A BIT OF BOTH_var102B_LifeTime: Life", "A BIT OF BOTH_var102B_LifeTime: Life",
"A BIT OF BOTH_var102B_LifeTime: Life", "A BIT OF BOTH_var102B_LifeTime: Life",
"A BIT OF BOTH_var102B_LifeTime: Life", "A BIT OF BOTH_var102B_LifeTime: Life",
"A BIT OF BOTH_var102B_LifeTime: Life", "A BIT OF BOTH_var102B_LifeTime: Life",
"A BIT OF BOTH_var102B_LifeTime: Life", "A BIT OF BOTH_var102B_LifeTime: Life",
"A BIT OF BOTH_var102B_LifeTime: Life", "A BIT OF BOTH_var102B_LifeTime: Life",
"A BIT OF BOTH_var102B_LifeTime: Life", "A BIT OF BOTH_var106_Track: CD",
"A BIT OF BOTH_var106_Track: CT", "A BIT OF BOTH_var106_Track: DE",
"A BIT OF BOTH_var106_Track: FG", "A BIT OF BOTH_var106_Track: GP",
"A BIT OF BOTH_var106_Track: GP", "A BIT OF BOTH_var106_Track: GP",
"A BIT OF BOTH_var106_Track: GP", "A BIT OF BOTH_var106_Track: GP",
"A BIT OF BOTH_var106_Track: GP", "A BIT OF BOTH_var106_Track: GP",
"A BIT OF BOTH_var106_Track: GP", "A BIT OF BOTH_var106_Track: KE",
"A BIT OF BOTH_var106_Track: MT", "A BIT OF BOTH_var106_Track: MT",
"A BIT OF BOTH_var106_Track: OT", "A BIT OF BOTH_var106_Track: PX",
"A BIT OF BOTH_var106_Track: PX", "A BIT OF BOTH_var107_Surface: Dirt",
"A BIT OF BOTH_var107_Surface: Dirt", "A BIT OF BOTH_var107_Surface: Dirt",
"A BIT OF BOTH_var107_Surface: Dirt", "A BIT OF BOTH_var107_Surface: Dirt",
"A BIT OF BOTH_var107_Surface: Dirt", "A BIT OF BOTH_var107_Surface: Dirt",
"A BIT OF BOTH_var107_Surface: Dirt", "A BIT OF BOTH_var107_Surface: Dirt",
"A BIT OF BOTH_var107_Surface: Dirt", "A BIT OF BOTH_var107_Surface: Dirt",
"A BIT OF BOTH_var107_Surface: Dirt", "A BIT OF BOTH_var107_Surface: Dirt",
"A BIT OF BOTH_var107_Surface: Synth", "A BIT OF BOTH_var107_Surface: Turf",
"A BIT OF BOTH_var107_Surface: Turf", "A BIT OF BOTH_var107_Surface: Turf"
【问题讨论】:
您的意思是将df
的小写horse
列与splits
数据框一起传递到您的函数中吗?这似乎很奇怪。您能否使用dput()
在您的问题中仅发布几行示例输入,以便复制/粘贴,并显示这些行的所需结果?举个小例子会更清楚,并且可以很好地保持独立。
@GregorThomas 是的,我的意思是通过小写马。我将尝试 dput() - 第一次听说它。谢谢。
例如,dput(df[1:10, ])
给出了 df
前 10 行的复制/粘贴版本,包括所有结构和类信息。这是发布 R 示例数据的首选方式。
顺便说一句,您的 github 存储库中的 data.frame 具有非标准的 unicode 空白。 A BIT OF BOTH
有 \u00a0
而不是 ` `。
@IanCampbell 谢谢,我已经提供了 dput() 数据。
【参考方案1】:
一种方法是使用 purrr::pmap
函数,该函数将函数应用于 data.frame 行。
library(tidyverse)
pmap(df, ~ getsplit_LT(splits, horse = .x, date = .y))
[[1]]
[1] 0.2156712
[[2]]
[1] 0
[[3]]
[1] 0.1070373
[[4]]
[1] 0.1339914
[[5]]
[1] 0.1593659
...
或者返回原来的data.frame:
bind_cols(df,kpi = pmap_dbl(df, ~ getsplit_LT(splits, horse = .x, date = .y)))
# A tibble: 18 x 3
horse race_date kpi
<chr> <date> <dbl>
1 A BIT OF BOTH 2020-09-28 0.216
2 A BIT OF BOTH 2020-01-10 0
3 A BIT OF BOTH 2020-02-14 0.107
4 A BIT OF BOTH 2020-03-14 0.134
5 A BIT OF BOTH 2020-04-20 0.159
6 A BIT OF BOTH 2020-06-15 0.183
7 A BIT OF BOTH 2020-07-14 0.227
...
数据:
splits <- read_csv("https://raw.githubusercontent.com/Handicappr/Rstudio_test_project/main/splits.csv")
df <- read_csv("https://raw.githubusercontent.com/Handicappr/Rstudio_test_project/main/df.csv")
splits %>% mutate(race_date = as.Date(race_date,"%m/%d/%y")) -> splits
df %>% mutate(race_date = as.Date(race_date,"%m/%d/%y")) -> df
【讨论】:
以上是关于如何纠正 R 函数中的变异和过滤错误的主要内容,如果未能解决你的问题,请参考以下文章
R语言广义线性模型函数GLMR中有几种logistic回归扩展和变异robust包中的glmRob函数鲁棒logistic回归ms包中的lrm函数拟合序数逻辑回归