函数不接受列调用

Posted

技术标签:

【中文标题】函数不接受列调用【英文标题】:function not accepting column call 【发布时间】:2017-06-12 02:19:40 【问题描述】:

我已经构建了一个函数,我想在其中传递一个数据框和数据框中的一列。例如:

testdf <- structure(list(date = c("2016-04-04", "2016-04-04", "2016-04-04", 
"2016-04-04", "2016-04-04", "2016-04-04"), sensorheight = c(1L, 
16L, 1L, 16L, 1L, 16L), farm = c("McDonald", "McDonald", "McDonald", 
"McDonald", "McDonald", "McDonald"), location = c("4", "4", "5", 
"5", "Outside", "Outside"), Temp = c(122.8875, 117.225, 102.0375, 
98.3625, 88.5125, 94.7)), .Names = c("date", "sensorheight", 
"farm", "location", "Temp"), row.names = c(NA, 6L), class = "data.frame")

> testdf
        date sensorheight     farm location     Temp
1 2016-04-04            1 McDonald        4 122.8875
2 2016-04-04           16 McDonald        4 117.2250
3 2016-04-04            1 McDonald        5 102.0375
4 2016-04-04           16 McDonald        5  98.3625
5 2016-04-04            1 McDonald  Outside  88.5125
6 2016-04-04           16 McDonald  Outside  94.7000

该函数根据不同列中的值从其他值中减去一些值。它正在工作,接受数据框和列输入,但自从更新 R 后,它就不起作用了。

DailyInOutDiff <- function (df, variable) 

  DailyInOutDiff04 <- df %>%
    filter(location %in% c(4, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else variable[location=="4"] - variable[location=='Outside'], 
              location = "4")  %>%
    select(1, 2, 3, 5, 4)

  DailyInOutDiff05 <- df %>%
    filter(location %in% c(5, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else variable[location=="5"] - variable[location=='Outside'], 
              location = "5")  %>%
    select(1, 2, 3, 5, 4)

  temp.list <- list(DailyInOutDiff04, DailyInOutDiff05)
  final.df = bind_rows(temp.list)
  return(final.df)


test <- DailyInOutDiff(testdf, "Temp")
test <- DailyInOutDiff(testdf, quote(Temp))

它们会产生以下错误消息:

  Error in summarise_impl(.data, dots) : 
  Evaluation error: non-numeric argument to binary operator. 

  Error in summarise_impl(.data, dots) : 
  Evaluation error: object of type 'symbol' is not subsettable. 

我想知道这些错误信息的含义以及如何解决它们。

我尝试了这些解决方案 Pass a data.frame column name to a function,但是没有一个解决方案对我有用。

如果我将列作为输入删除,则不会发生错误,但我需要该列,因为我将该函数应用于大型数据框中的多个列。

我想要的输出:

        date sensorheight     farm location     Temp
1 2016-04-04            1 McDonald        4  34.3750
2 2016-04-04           16 McDonald        4  22.5250
3 2016-04-04            1 McDonald        5  13.5250
4 2016-04-04           16 McDonald        5   3.6625

【问题讨论】:

建议欺骗:use dynamic column names in dplyr 另见包小插曲Programming with dplyr 你能提供你希望得到的输出吗? @beigel 查看编辑。 【参考方案1】:

我无法复制第二个错误,但我可以复制第一个错误。似乎summarise 函数无法调用Temp,因为它认为它是一个character 对象。换句话说,您调用的是列名,而不是列。如果您在函数内逐行运行代码,而不是 variable 您使用 df$variable 您将看到它有效。

话虽如此,解决方案非常简单。我刚刚在您的函数中添加了 variable&lt;- as.name(variable) 行。现在是这样的:

DailyInOutDiff <- function (df, variable) 

  variable<- as.name(variable)
  DailyInOutDiff04 <- df %>%
    filter(location %in% c(4, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else variable[location=="4"] - variable[location=='Outside'], 
              location = "4")  %>%
    select(1, 2, 3, 5, 4)

  DailyInOutDiff05 <- df %>%
    filter(location %in% c(5, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else variable[location=="5"] - variable[location=='Outside'], 
              location = "5")  %>%
    select(1, 2, 3, 5, 4)

  temp.list <- list(DailyInOutDiff04, DailyInOutDiff05)
  final.df = bind_rows(temp.list)
  return(final.df)

输出是:

> test <- DailyInOutDiff(testdf, "Temp")
> test
Source: local data frame [4 x 5]
Groups: date, sensorheight [2]

        date sensorheight     farm location    Diff
       <chr>        <int>    <chr>    <chr>   <dbl>
1 2016-04-04            1 McDonald        4 34.3750
2 2016-04-04           16 McDonald        4 22.5250
3 2016-04-04            1 McDonald        5 13.5250
4 2016-04-04           16 McDonald        5  3.6625

【讨论】:

【参考方案2】:

如果您使用最新的dplyr (0.7),您可以使用.data 来使用字符串引用列名,您的函数将被修改为:

DailyInOutDiff <- function (df, variable) 

  DailyInOutDiff04 <- df %>%
    filter(location %in% c(4, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else .data[[variable]][location=="4"] - .data[[variable]][location=='Outside'], 
              location = "4")  %>%
    select(1, 2, 3, 5, 4)

  DailyInOutDiff05 <- df %>%
    filter(location %in% c(5, 'Outside')) %>% 
    group_by(date, sensorheight, farm) %>%
    arrange(sensorheight, farm, location) %>%
    summarise(Diff = if(n()==1) NA else .data[[variable]][location=="5"] - .data[[variable]][location=='Outside'], 
              location = "5")  %>%
    select(1, 2, 3, 5, 4)

  temp.list <- list(DailyInOutDiff04, DailyInOutDiff05)
  final.df = bind_rows(temp.list)
  return(final.df)

variable[...] 更改为.data[[variable]][...] 意味着它现在选择variable 中的字符串指定的列,而不是尝试索引实际的字符串。使用提供的数据运行此函数会返回:

DailyInOutDiff(testdf, "Temp")
#> # A tibble: 4 x 5
#> # Groups:   date, sensorheight [2]
#>         date sensorheight     farm location    Diff
#>        <chr>        <int>    <chr>    <chr>   <dbl>
#> 1 2016-04-04            1 McDonald        4 34.3750
#> 2 2016-04-04           16 McDonald        4 22.5250
#> 3 2016-04-04            1 McDonald        5 13.5250
#> 4 2016-04-04           16 McDonald        5  3.6625

【讨论】:

【参考方案3】:

以下调用函数 DailyInOutDiff 并将 testdf 分配给 df 并将 "Temp" 分配给 variable

   test <- DailyInOutDiff(testdf, "Temp")
   test <- DailyInOutDiff(testdf, quote(Temp))

根据您要执行的操作,您希望从数据框中传递一个数据框和一列。目前您只传递 column name,它是一个字符串,而不是 column。您必须将其更改为

      test <- DailyInOutDiff(testdf, testdf["Temp"])

其次,您正在传递 Temp 列并尝试在以下代码中根据 location 过滤变量数据框。

summarise(Diff = if(n()==1) NA else variable[location=="4"] - variable[location=='Outside'], 位置 = "4")

一定是的,

    variable[variable$location=="4",] 

如果你的电话是,

    test <- DailyInOutDiff(testdf, testdf["Temp"]) 

   variable[variable$Temp=="4",] 

如果你打电话是,

    test <- DailyInOutDiff(testdf, testdf["Temp"]) 

【讨论】:

以上是关于函数不接受列调用的主要内容,如果未能解决你的问题,请参考以下文章

无类型函数调用可能不接受类型 arguments.ts(2347)

Shell函数接受用户输入

vs函数不接受一个参数是啥问题呢

为啥 require() 函数不接受变量? [复制]

java script 函数

geom_hline 或 geom_vline 似乎不接受矢量作为参考线,如果在函数内部调用并且使用 facet_grid()