dplyr transmute 返回的行数少于原始数据帧

Posted

技术标签:

【中文标题】dplyr transmute 返回的行数少于原始数据帧【英文标题】:dplyr transmute returning fewer rows than the original data frame 【发布时间】:2015-04-23 09:59:32 【问题描述】:

我需要获取 4 行分组数据集的摘要(基本上是围绕数据框子集中数据点集的正方形。

一个函数:

myfun <- function(F1,F2)

  out <-structure(list(f2 = c(1097.81431421448, 2331.43870452636, 2154.84583430979, 
1210.68973077198), f1 = c(411.462078942253, 334.070858898298, 
834.761924536241, 782.569047430496)), .Names = c("f2", "f1"), row.names = c(NA, 
4L), class = "data.frame")
  return(out)

一个数据集:

pb2 <-
structure(list(Type = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("c", 
"m", "w"), class = "factor"), Sex = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("f", "m"), class = "factor"), Speaker = c("1", 
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1"), Vowel = structure(c(8L, 8L, 7L, 
7L, 5L, 5L, 2L, 2L, 3L, 3L, 1L, 1L, 4L, 4L, 9L, 9L, 10L, 10L, 
6L, 6L), .Label = c("aa", "ae", "ah", "ao", "eh", "er", "ih", 
"iy", "uh", "uw"), class = "factor"), IPA = structure(c(9L, 9L, 
7L, 7L, 4L, 4L, 1L, 1L, 8L, 8L, 2L, 2L, 3L, 3L, 6L, 6L, 10L, 
10L, 5L, 5L), .Label = c("\\ae", "\\as", "\\ct", "\\ef", "\\er\\hr", 
"\\hs", "\\ic", "\\vt", "i", "u"), class = "factor"), F0 = c(160L, 
186L, 203L, 192L, 161L, 155L, 140L, 180L, 144L, 148L, 148L, 170L, 
161L, 158L, 163L, 190L, 160L, 157L, 177L, 164L), F1 = c(240L, 
280L, 390L, 310L, 490L, 570L, 560L, 630L, 590L, 620L, 740L, 800L, 
600L, 660L, 440L, 400L, 240L, 270L, 370L, 460L), F2 = c(2280L, 
2400L, 2030L, 1980L, 1870L, 1700L, 1820L, 1700L, 1250L, 1300L, 
1070L, 1060L, 970L, 980L, 1120L, 1070L, 1040L, 930L, 1520L, 1330L
), F3 = c(2850L, 2790L, 2640L, 2550L, 2420L, 2600L, 2660L, 2550L, 
2620L, 2530L, 2490L, 2640L, 2280L, 2220L, 2210L, 2280L, 2150L, 
2280L, 1670L, 1590L)), .Names = c("Type", "Sex", "Speaker", "Vowel", 
"IPA", "F0", "F1", "F2", "F3"), row.names = c(NA, 20L), class = "data.frame")

使用 dplyr 进行总结..:

library(dplyr)

> pb %>% group_by(Type,Sex) %>% transmute(F1=myfun(F1,F2)["f1"]) 
Source: local data frame [1,520 x 3]
Groups: Type, Sex

   Type Sex       F1
1     m   m <dbl[4]>
2     m   m <dbl[4]>
3     m   m <dbl[4]>
4     m   m <dbl[4]>
5     m   m <dbl[4]>
6     m   m <dbl[4]>
7     m   m <dbl[4]>
8     m   m <dbl[4]>
9     m   m <dbl[4]>
10    m   m <dbl[4]>

该函数返回一个数据框列,但这些列并没有按照我预期的方式附加在一起。如何让这些值相互叠加?

【问题讨论】:

【参考方案1】:

你快到了。只需unnest 你所拥有的:

 library(tidyr)
 pb2 %>% 
   group_by(Type,Sex) %>% 
   transmute(F1=myfun(F1,F2)["f1"]) %>% 
   unnest(F1)

输出:

# Source: local data frame [80 x 3]
# 
#    Type Sex       F1
# 1     m   m 411.4621
# 2     m   m 334.0709
# 3     m   m 834.7619
# 4     m   m 782.5690
# 5     m   m 411.4621
# 6     m   m 334.0709
# 7     m   m 834.7619
# 8     m   m 782.5690
# 9     m   m 411.4621
# 10    m   m 334.0709
# ..  ... ...      ...

【讨论】:

以上是关于dplyr transmute 返回的行数少于原始数据帧的主要内容,如果未能解决你的问题,请参考以下文章

查询在 JDBC 中使用时返回的行数少于 SQL 开发人员

R语言dplyr包使用transmute函数生成新的数据列(删除所有原数据列)实战

插入大查询表的行数少于预期

DataTable.Load 显示的行数少于源 DataReader

R语言dplyr包使用count函数统计分组的行数(样本数)实战:包含单变量样本统计多变量样本统计分组的汇总统计

R语言mutate函数这transmute函数为dataframe添加新的数据列实战