将数据帧拆分为单独的并应用公式以计算R中各段的过渡

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了将数据帧拆分为单独的并应用公式以计算R中各段的过渡相关的知识,希望对你有一定的参考价值。

我有4列的数据框。

我想按列age_group将数据帧拆分为单独的数据帧,并计算从segment_2018segment_2020的过渡。因此,结果应该是由age_group产生的几个数据表(取决于table(df$segment_2018, df$segment_2020)值的数量)。有什么想法吗?

数据样本:

structure(list(cust_id = c(5689748L, 1256987L, 8596263L, 4152659L, 
4589521L, 0125698L, 2896359L, 2045975L, 3759826L, 4625831L, 1875964L, 
6132852L, 8365472L, 1287465L, 9765287L, 9357452L, 8725691L, 4051697L, 
5783105L, 6040870L), segment_2018 = c("256", "258", "259", "2061", 
"2061", "2061", "7", "256", "259", "1029", "256", "258", "256", 
"67", "12", "258", "4115", "4115", "13", "1029"), age_group = c("58_59", 
"70_71", "62_63", "56_57", "62_63", "0", "46_47", "52_53", "52_53", 
"52_53", "56_57", "50_51", "0", "52_53", "50_51", "62_63", "62_63", 
"70_71", "44_45", "50_51"), segment_2020 = c("256", "258", "256", 
"2061", "17", "0", "7", "17", "133", "528", "256", "258", "0", 
"67", "12", "258", "133", "4114", "12", "1029")), row.names = c(NA, 
20L), class = "data.frame")


   cust_id segment_2018 age_group segment_2020
1  5689748          256     58_59          256
2  1256987          258     70_71          258
3  8596263          259     62_63          256
4  4152659         2061     56_57         2061
5  4589521         2061     62_63           17
6   125698         2061         0            0
7  2896359            7     46_47            7
8  2045975          256     52_53           17
9  3759826          259     52_53          133
10 4625831         1029     52_53          528
11 1875964          256     56_57          256
12 6132852          258     50_51          258
13 8365472          256         0            0
14 1287465           67     52_53           67
15 9765287           12     50_51           12
16 9357452          258     62_63          258
17 8725691         4115     62_63          133
18 4051697         4115     70_71         4114
19 5783105           13     44_45           12
20 6040870         1029     50_51         1029

预期输出:

structure(c(1859L, 3661L, 214L, 106L, 107L, 209L, 341L, 1770L, 
16343L, 106881L, 5078L, 317L, 593L, 8237L, 1106L, 271L, 402L, 
285L, 422L, 428L, 115L, 365L, 40507L, 11700L, 132L, 50L, 815L, 
375L, 189L, 998L, 14207L, 3171L, 882L, 307L, 948L, 7774L, 1985L, 
1414L, 2025L, 750L, 929L, 947L, 21L, 810L, 905L, 14358L, 4L, 
0L, 97L, 115L, 21L, 547L, 12926L, 2285L, 154L, 24L, 1120L, 1851L, 
346L, 215L, 122L, 79L, 98L, 310L, 1L, 72L, 502L, 251L, 10264L, 
1837L, 85L, 33L, 14L, 17L, 240L, 185L, 74L, 21L, 48L, 401L, 225L, 
111L, 115L, 23L, 57L, 77L, 94L, 187L, 313L, 150L, 206L, 5228L, 
78L, 35L, 13L, 2L, 143L, 120L, 66L, 18L, 23L, 269L, 136L, 64L, 
106L, 19L, 48L, 66L, 1057L, 121L, 1531L, 563L, 51L, 33L, 2922L, 
266L, 86L, 24L, 305L, 74L, 513L, 311L, 85L, 875L, 1068L, 291L, 
315L, 48L, 1116L, 902L, 15L, 197L, 497L, 418L, 66L, 28L, 439L, 
1517L, 35L, 26L, 491L, 233L, 170L, 92L, 238L, 597L, 325L, 122L, 
339L, 117L, 120L, 1209L, 32L, 91L, 236L, 739L, 4L, 0L, 43L, 26L, 
5345L, 1443L, 182L, 171L, 432L, 190L, 69L, 823L, 202L, 7L, 138L, 
72L, 23L, 72L, 0L, 15L, 44L, 274L, 3L, 1L, 3L, 4L, 68L, 4170L, 
141L, 575L, 185L, 31L, 30L, 122L, 1L, 5L, 4L, 2L, 4L, 8L, 0L, 
11L, 1891L, 6236L, 75L, 31L, 126L, 192L, 12L, 429L, 44940L, 11113L, 
544L, 93L, 704L, 4536L, 414L, 529L, 175L, 88L, 266L, 385L, 26L, 
476L, 1882L, 2654L, 84L, 48L, 78L, 186L, 171L, 1112L, 15439L, 
64342L, 1394L, 174L, 531L, 5187L, 608L, 178L, 313L, 193L, 256L, 
383L, 22L, 211L, 182L, 83L, 44L, 18L, 215L, 78L, 51L, 70L, 139L, 
117L, 16367L, 912L, 85L, 182L, 71L, 104L, 327L, 99L, 214L, 233L, 
15L, 142L, 136L, 49L, 16L, 10L, 194L, 63L, 65L, 49L, 63L, 35L, 
2214L, 3989L, 35L, 124L, 38L, 6L, 166L, 39L, 43L, 128L, 13L, 
49L, 159L, 2751L, 1L, 2L, 27L, 63L, 1L, 37L, 1371L, 444L, 85L, 
13L, 1098L, 308L, 123L, 52L, 84L, 60L, 27L, 270L, 0L, 17L, 3610L, 
10976L, 80L, 32L, 417L, 383L, 915L, 2046L, 29728L, 7587L, 1804L, 
468L, 818L, 72508L, 7699L, 729L, 1357L, 735L, 669L, 960L, 17L, 
448L, 1746L, 9166L, 38L, 13L, 526L, 232L, 250L, 212L, 4648L, 
1099L, 433L, 129L, 859L, 16061L, 9197L, 471L, 1658L, 594L, 431L, 
722L, 10L, 241L, 1062L, 864L, 87L, 4L, 177L, 61L, 2L, 7L, 473L, 
177L, 105L, 2L, 129L, 810L, 487L, 3680L, 253L, 92L, 338L, 183L, 
6L, 417L, 4791L, 3960L, 44L, 28L, 240L, 279L, 304L, 99L, 1559L, 
545L, 947L, 332L, 1396L, 4115L, 4226L, 533L, 3921L, 624L, 222L, 
1234L, 14L, 235L, 763L, 1480L, 5L, 1L, 46L, 84L, 123L, 41L, 628L, 
165L, 124L, 46L, 601L, 1012L, 813L, 102L, 253L, 561L, 51L, 320L, 
1L, 44L, 591L, 227L, 17L, 5L, 584L, 74L, 15L, 7L, 241L, 84L, 
163L, 47L, 18L, 497L, 288L, 305L, 44L, 15L, 3920L, 146L, 5L, 
109L, 1613L, 1577L, 61L, 32L, 1657L, 883L, 108L, 44L, 1195L, 
465L, 493L, 219L, 951L, 1555L, 1275L, 296L, 1704L, 460L, 368L, 
2584L, 25L, 199L, 254L, 67L, 232L, 276L, 176L, 82L, 6L, 5L, 95L, 
110L, 73L, 32L, 30L, 170L, 126L, 54L, 98L, 13L, 53L, 144L, 4957L, 
147L, 354L, 198L, 2424L, 98L, 53L, 26L, 6L, 14L, 168L, 133L, 
53L, 6L, 37L, 323L, 127L, 427L, 81L, 25L, 44L, 51L, 37L, 12899L
), .Dim = 22:23, .Dimnames = structure(list(c("1029", "1031", 
"12", "13", "133", "17", "2056", "2060", "2061", "256", "258", 
"259", "265", "4114", "4115", "5", "528", "529", "65", "67", 
"7", "9"), c("0", "1029", "1031", "12", "13", "133", "17", "2056", 
"2060", "2061", "256", "258", "259", "265", "4114", "4115", "5", 
"528", "529", "65", "67", "7", "9")), .Names = c("", "")), class = "table")
答案

我们可以分割数据并在每个组上使用table。这可以通过splitlapply完成。

temp <- lapply(split(df, df$age_group), function(x) 
                     table(x$segment_2018, x$segment_2020))

或使用by

temp <- by(df, df$age_group, function(x) table(x$segment_2018, x$segment_2020))

这将返回表列表。通常,最好将它们保留为列表,因为它更易于管理并且不会使全局环境混乱,但是如果要将它们作为单独的对象,我们可以使用list2env

#As temp have names with numbers prefixing "table" to it.
names(temp) <- paste0('table_', names(temp))
list2env(temp, .GlobalEnv)
另一答案

我们也可以这样做

temp <- lapply(split(df, df$age_group), function(x) 
                 table(x[c('segment_2018', 'segment_2020')]))

以上是关于将数据帧拆分为单独的并应用公式以计算R中各段的过渡的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 R 中的条件语句将数据帧拆分为多个数据帧

根据 NaN 值将数据帧拆分为多个数据帧

R 中将拆分数据帧保存为新文件的问题

如何根据字节大小拆分熊猫数据帧

Pandas:如何将多个数据帧作为 HTML 表格引用和打印

Python将图像拆分为帧,颜色混乱