测试变量向量并在表上求和,在R中创建新列

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了测试变量向量并在表上求和,在R中创建新列相关的知识,希望对你有一定的参考价值。

我有这样一张桌子:

df <- read.table(text = 
                "  Day      city    gender     week
                 'day1'    'city1'   'M'       'one'
                 'day2'    'city2'   'M'       'two'
                 'day1'    'city3'   'F'       'two'
                 'day2'    'city4'   'F'       'two'", 
                 header = TRUE, stringsAsFactors = FALSE) 

我正在计算这样的汇总表:

daily_table <- setDT(df)[, .(Daily_Freq = .N,
                             men = sum(gender == 'M'),
                             women = sum(gender == 'F'),
                             city1 = sum(city == 'city1'),
                             city2 = sum(city == 'city2'),
                             city3 = sum(city == 'city3'),
                             city4 = sum(city == 'city4'),
                             city5 = sum(city == 'city5'))
                         , by = .(week,Day)]

制作这张桌子:

   week  Day Daily_Freq men women city1 city2 city3 city4 city5
    one day1          1   1     0     1     0     0     0     0
    two day2          2   1     1     0     1     0     1     0
    two day1          1   0     1     0     0     1     0     0

但是因为我有几个城市,所以我想使用带有名字的矢量:

cities <- c("city1","city2","city3","city4","city5")

请注意,我的向量中有5个城市,即使其中一个城市的出现次数为零,我希望它出现在我的最终表格中。我该怎么做?

答案

为了确保R显示city5,即使没有对该值的观察,请将其添加为因子级别:

setDT(df)

df[, city :=  factor(city,
                     levels = c("city1","city2","city3","city4","city5"))]

为了避免为每个级别的city写出测试,你可以迭代city的级别,如下所示:

daily_table <- df[, c(.(Daily_Freq = .N,
                        men = sum(gender == 'M'),
                        women = sum(gender == 'F')),
                      lapply(setNames(levels(city), levels(city)),
                             function(x) sum(city == x))),
                  by = .(week,Day)]
daily_table
##    week  Day Daily_Freq men women city1 city2 city3 city4 city5
## 1:  one day1          1   1     0     1     0     0     0     0
## 2:  two day2          2   1     1     0     1     0     1     0
## 3:  two day1          1   0     1     0     0     1     0     0

以上是关于测试变量向量并在表上求和,在R中创建新列的主要内容,如果未能解决你的问题,请参考以下文章

使用 foreach 循环容器在表中创建新列 - 无法解决“'@P1' 附近的语法错误”错误

用于在 R 中创建和求和子集的用户定义函数

如何在R中的空数据框中创建新列[重复]

如何编写用于在表中创建新拆分分区的 plsql 代码?

Pandas - 匹配来自两个数据帧的两列并在 df1 中创建新列

通过分解另一个变量在R中创建新变量[重复]