将多个值列重塑为宽格式

Posted

技术标签:

【中文标题】将多个值列重塑为宽格式【英文标题】:Reshape multiple value columns to wide format 【发布时间】:2012-07-21 10:21:00 【问题描述】:

我有以下数据框,我想使用 cast 创建一个“数据透视表”,其中包含两个值(值和百分比)的列。 这是数据框:

expensesByMonth <- structure(list(month = c("2012-02-01", "2012-02-01", "2012-02-01", 
"2012-02-01", "2012-02-01", "2012-02-01", "2012-02-01", "2012-02-01", 
"2012-02-01", "2012-02-01", "2012-02-01", "2012-02-01", "2012-03-01", 
"2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", 
"2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", 
"2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", "2012-04-01", 
"2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", 
"2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", 
"2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", 
"2012-04-01", "2012-04-01", "2012-05-01", "2012-05-01", "2012-05-01", 
"2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", 
"2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", 
"2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", 
"2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", 
"2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", 
"2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", 
"2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", 
"2012-07-01", "2012-07-01", "2012-07-01", "2012-07-01", "2012-07-01", 
"2012-07-01", "2012-07-01", "2012-07-01", "2012-07-01", "2012-07-01", 
"2012-07-01", "2012-07-01", "2012-07-01"), 
expense_type = c("Adjustment", "Bank Service Charge", "Cable", "Clubbing", "Dining", "Education", 
"Gifts", "Groceries", "Lunch", "Personal Care", "Rent", "Transportation", 
"Adjustment", "Bank Service Charge", "Cable", "Clubbing", "Dining", 
"Gifts", "Groceries", "Lunch", "Medical Expenses", "Miscellaneous", 
"Personal Care", "Phone", "Recreation", "Rent", "Transportation", 
"Adjustment", "Bank Service Charge", "Clothes", "Clubbing", "Computer", 
"Dining", "Gifts", "Groceries", "Lunch", "Maintenance", "Medical Expenses", 
"Miscellaneous", "Personal Care", "Phone", "Recreation", "Rent", 
"Transportation", "Travel", "Bank Service Charge", "Cable", "Clothes", 
"Clubbing", "Computer", "Dining", "Electric", "Gifts", "Groceries", 
"Lunch", "Maintenance", "Medical Expenses", "Miscellaneous", 
"Personal Care", "Phone", "Recreation", "Rent", "Transportation", 
"Adjustment", "Bank Service Charge", "Cable", "Charity", "Clothes", 
"Computer", "Dining", "Education", "Electric", "Gifts", "Groceries", 
"Lunch", "Maintenance", "Medical Expenses", "Miscellaneous", 
"Personal Care", "Phone", "Recreation", "Rent", "Transportation", 
"Computer", "Gifts", "Groceries", "Lunch", "Maintenance", "Medical Expenses", 
"Miscellaneous", "Personal Care", "Phone", "Recreation", "Rent", 
"Repair and Maintenance", "Transportation"), 
value = c(442.37, 200, 21.33, 75, 22.5, 1800, 10, 233.33, 154.75, 30, 545, 32.5, 
2, 200, 36.33, 206.55, 74.5, 89, 372.68, 383.75, 144.19, 508.11, 
30, 38.4, 81.75, 1746.7, 35, 16.37, 200, 806.9, 324.81, 756, 
80.5, 100, 398.37, 326.25, 151, 29.95, 101, 90, 38.45, 61, 743.75, 
129, 228.53, 200, 39.05, 237, 40, 283.83, 141.32, 32.88, 30, 
424.4, 412, 142.75, 86.55, 1051.5, 30, 38.9, 51.5, 749.7, 35, 
10, 200, 16, 32.59, 149.81, 100, 80, 60, 31.91, 55, 397.25, 486.4, 
115.6, 47.08, 1000, 120, 41.11, 256, 761.6, 55, 10.54, 10, 342.11, 
291, 76.5, 66.8, 1008, 30, 41.11, 316, 765, 65, 62), 
percent = c(0.124025030980324, 0.0560729845967511, 0.00598018380724351, 0.0210273692237817, 
0.0063082107671345, 0.50465686137076, 0.00280364922983756, 0.0654175474797997, 
0.0433864718317362, 0.00841094768951267, 0.152798883026147, 0.00911185999697206, 
0.000506462461002391, 0.0506462461002391, 0.00919989060410842, 
0.0523049106600219, 0.018865726672339, 0.0225375795146064, 0.0943742149831854, 
0.0971774847048337, 0.0365134111259673, 0.128669320529962, 0.00759693691503586, 
0.0097240792512459, 0.0207016530934727, 0.442318990316438, 0.00886309306754183, 
0.00357276925628781, 0.0436502047194601, 0.176106750940662, 0.0708901149746392, 
0.164997773839559, 0.0175692073995827, 0.0218251023597301, 0.0869446602704567, 
0.0712043964486193, 0.0329559045631924, 0.00653661815673915, 
0.0220433533833274, 0.0196425921237571, 0.00839175185731621, 
0.0133133124394353, 0.162324198800492, 0.0281543820440518, 0.0498769064226911, 
0.0496724104530621, 0.00969853814096037, 0.0588618063868785, 
0.00993448209061241, 0.070492601294463, 0.0350985252261336, 0.0081661442784834, 
0.00745086156795931, 0.105404854981398, 0.102325165533308, 0.035453682960873, 
0.0214957356235626, 0.261152697956974, 0.00745086156795931, 0.00966128383312057, 
0.0127906456916635, 0.186197030583303, 0.00869267182928586, 0.00249044292527426, 
0.0498088585054852, 0.00398470868043882, 0.00811635349346881, 
0.0373093254635337, 0.0249044292527426, 0.0199235434021941, 0.0149426575516456, 
0.00794700337455016, 0.0136974360890084, 0.09893284520652, 0.12113514388534, 
0.0287895202161704, 0.0117250052921912, 0.249044292527426, 0.0298853151032911, 
0.0102382108658025, 0.0637553388870211, 0.189672133188888, 0.0136974360890084, 
0.00341757293956667, 0.0032424790697976, 0.110928451456846, 0.0943561409311103, 
0.0248049648839517, 0.021659760186248, 0.326841890235599, 0.00972743720939281, 
0.013329831455938, 0.102462338605604, 0.248049648839517, 0.0210761139536844, 
0.0201033702327451)), 
.Names = c("month", "expense_type", "value", "percent"), 
row.names = c(NA, -96L), 
class = "data.frame"
)

这是我想要创建的(当然,使用不同的标题名称,例如:[month]_value、[month]_percent):

expenses   value     percent value.1   percent.1 value.2   percent.2 value.3   percent.3 value.4   percent.4 value.5   percent.5
1              Adjustment  442.37 0.124025031    2.00 0.000506462   16.37 0.003572769    0.00 0.000000000   10.00 0.002490443    0.00 0.000000000
2     Bank Service Charge  200.00 0.056072985  200.00 0.050646246  200.00 0.043650205  200.00 0.049672410  200.00 0.049808859    0.00 0.000000000
3                   Cable   21.33 0.005980184   36.33 0.009199891    0.00 0.000000000   39.05 0.009698538   16.00 0.003984709    0.00 0.000000000
4                 Charity    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000   32.59 0.008116353    0.00 0.000000000
5                 Clothes    0.00 0.000000000    0.00 0.000000000  806.90 0.176106751  237.00 0.058861806  149.81 0.037309325    0.00 0.000000000
6                Clubbing   75.00 0.021027369  206.55 0.052304911  324.81 0.070890115   40.00 0.009934482    0.00 0.000000000    0.00 0.000000000
7                Computer    0.00 0.000000000    0.00 0.000000000  756.00 0.164997774  283.83 0.070492601  100.00 0.024904429   10.54 0.003417573
8                  Dining   22.50 0.006308211   74.50 0.018865727   80.50 0.017569207  141.32 0.035098525   80.00 0.019923543    0.00 0.000000000
9               Education 1800.00 0.504656861    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000   60.00 0.014942658    0.00 0.000000000
10               Electric    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000   32.88 0.008166144   31.91 0.007947003    0.00 0.000000000
11                  Gifts   10.00 0.002803649   89.00 0.022537580  100.00 0.021825102   30.00 0.007450862   55.00 0.013697436   10.00 0.003242479
12              Groceries  233.33 0.065417547  372.68 0.094374215  398.37 0.086944660  424.40 0.105404855  397.25 0.098932845  342.11 0.110928451
13                  Lunch  154.75 0.043386472  383.75 0.097177485  326.25 0.071204396  412.00 0.102325166  486.40 0.121135144  291.00 0.094356141
14            Maintenance    0.00 0.000000000    0.00 0.000000000  151.00 0.032955905  142.75 0.035453683  115.60 0.028789520   76.50 0.024804965
15       Medical Expenses    0.00 0.000000000  144.19 0.036513411   29.95 0.006536618   86.55 0.021495736   47.08 0.011725005   66.80 0.021659760
16          Miscellaneous    0.00 0.000000000  508.11 0.128669321  101.00 0.022043353 1051.50 0.261152698 1000.00 0.249044293 1008.00 0.326841890
17          Personal Care   30.00 0.008410948   30.00 0.007596937   90.00 0.019642592   30.00 0.007450862  120.00 0.029885315   30.00 0.009727437
18                  Phone    0.00 0.000000000   38.40 0.009724079   38.45 0.008391752   38.90 0.009661284   41.11 0.010238211   41.11 0.013329831
19             Recreation    0.00 0.000000000   81.75 0.020701653   61.00 0.013313312   51.50 0.012790646  256.00 0.063755339  316.00 0.102462339
20                   Rent  545.00 0.152798883 1746.70 0.442318990  743.75 0.162324199  749.70 0.186197031  761.60 0.189672133  765.00 0.248049649
21 Repair and Maintenance    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000   65.00 0.021076114
22         Transportation   32.50 0.009111860   35.00 0.008863093  129.00 0.028154382   35.00 0.008692672   55.00 0.013697436   62.00 0.020103370
23                 Travel    0.00 0.000000000    0.00 0.000000000  228.53 0.049876906    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000

在对单个值列使用强制转换时,我还遇到了以下错误:它没有考虑“值”参数。因此,即使我指定 value = "percent",它仍然会显示 "value" 列中的值。

cast(expensesByMonth, expense_type ~ month, fun.aggregate = sum, value = "percent")

【问题讨论】:

下面的答案很有帮助;谢谢。不过,我想知道:这项任务可以通过 tidyr 完成吗? 【参考方案1】:

您最好的选择是将数据重新整形为长格式,使用melt,然后使用dcast

library(reshape2)

meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2)
dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum)

输出的前几行:

             expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent
1              Adjustment           442.37        0.124025031             2.00       0.0005064625
2     Bank Service Charge           200.00        0.056072985           200.00       0.0506462461
3                   Cable            21.33        0.005980184            36.33       0.0091998906
4                 Charity             0.00        0.000000000             0.00       0.0000000000

【讨论】:

谢谢,这就是我想要的。另外,你知道为什么演员表没有按预期工作吗? cast(expensesByMonth,expense_type ~ 月, fun.aggregate = sum, value = "percent") cast 是(现已失效的)包reshape 中的一个函数。它已在reshape2 中替换为dcastacast。我不再安装旧版本。 这个问题/答案是纯艺术。谢谢。 如果我每个月有多个相同费用类型的条目,我该怎么做?我不想添加之前的值。【参考方案2】:

data.table 可以转换为多个value.var 变量。这是非常直接(且有效)的。

因此:

library(data.table) # v1.9.5+
dcast(setDT(expensesByMonth), expense_type ~ month, value.var = c("value", "percent"))

【讨论】:

【参考方案3】:

现在可以使用 tidyr 1.0.0 中引入的新功能 pivot_wider() 将具有多个值/度量列的长格式重塑为宽格式。

这比spread()spread() 优于之前的 tidyr 策略spread(),因为不再删除属性(例如,日期仍然是日期,字符串仍然是字符串)。

pivot_wider()(对应方:pivot_longer())的工作方式类似于spread()。 但是,它提供了额外的功能,例如多值列。 为此,参数values_from——指示取自哪一列的值——可能采用多个列名。

NAs 可以使用参数values_fill 填充。

library("tidyr")
library("magrittr")

pivot_wider(expensesByMonth, 
            id_cols = expense_type,
            names_from = month,
            values_from = c(value, percent))
#> # A tibble: 23 x 13
#>    expense_type `value_2012-02-~ `value_2012-03-~ `value_2012-04-~
#>    <chr>                   <dbl>            <dbl>            <dbl>
#>  1 Adjustment              442.               2               16.4
#>  2 Bank Servic~            200              200              200  
#>  3 Cable                    21.3             36.3             NA  
#>  4 Clubbing                 75              207.             325. 
#>  5 Dining                   22.5             74.5             80.5
#>  6 Education              1800               NA               NA  
#>  7 Gifts                    10               89              100  
#>  8 Groceries               233.             373.             398. 
#>  9 Lunch                   155.             384.             326. 
#> 10 Personal Ca~             30               30               90  
#> # ... with 13 more rows, and 9 more variables: `value_2012-05-01` <dbl>,
#> #   `value_2012-06-01` <dbl>, `value_2012-07-01` <dbl>,
#> #   `percent_2012-02-01` <dbl>, `percent_2012-03-01` <dbl>,
#> #   `percent_2012-04-01` <dbl>, `percent_2012-05-01` <dbl>,
#> #   `percent_2012-06-01` <dbl>, `percent_2012-07-01` <dbl>

或者,可以使用提供更好控制的pivot规范来完成重塑(参见下面的链接):

# see also ?build_wider_spec
spec <- expensesByMonth %>%
    expand(month, .value = c("percent", "value")) %>%
    dplyr::mutate(.name = paste(.$month, .$.value, sep = "_"))
pivot_wider_spec(expensesByMonth, spec = spec)

由reprex package (v0.2.1) 于 2019 年 3 月 26 日创建

另见:https://tidyr.tidyverse.org/dev/articles/pivot.html

【讨论】:

【参考方案4】:

由于这个问题经常被访问,我认为它也应该得到一个完整的基本 R 答案。来自基本 R 的 reshape-function 非常通用,也可以轻松应用于此问题:

expenses <- reshape(expensesByMonth, idvar = 'expense_type', direction = 'wide',
                    timevar = 'month', sep = '_')

带有NA-values 的单元格可以用0 替换为:

expenses[is.na(expenses)] <- 0

给出(由expense_type 排序以便更容易与所需输出进行比较):

> expenses[order(expenses$expense_type),]
             expense_type value_2012-02-01 percent_2012-02-01 value_2012-03-01 percent_2012-03-01 value_2012-04-01 percent_2012-04-01 value_2012-05-01 percent_2012-05-01 value_2012-06-01 percent_2012-06-01 value_2012-07-01 percent_2012-07-01
1              Adjustment           442.37        0.124025031             2.00       0.0005064625            16.37        0.003572769             0.00        0.000000000            10.00        0.002490443             0.00        0.000000000
2     Bank Service Charge           200.00        0.056072985           200.00       0.0506462461           200.00        0.043650205           200.00        0.049672410           200.00        0.049808859             0.00        0.000000000
3                   Cable            21.33        0.005980184            36.33       0.0091998906             0.00        0.000000000            39.05        0.009698538            16.00        0.003984709             0.00        0.000000000
67                Charity             0.00        0.000000000             0.00       0.0000000000             0.00        0.000000000             0.00        0.000000000            32.59        0.008116353             0.00        0.000000000
30                Clothes             0.00        0.000000000             0.00       0.0000000000           806.90        0.176106751           237.00        0.058861806           149.81        0.037309325             0.00        0.000000000
4                Clubbing            75.00        0.021027369           206.55       0.0523049107           324.81        0.070890115            40.00        0.009934482             0.00        0.000000000             0.00        0.000000000
32               Computer             0.00        0.000000000             0.00       0.0000000000           756.00        0.164997774           283.83        0.070492601           100.00        0.024904429            10.54        0.003417573
5                  Dining            22.50        0.006308211            74.50       0.0188657267            80.50        0.017569207           141.32        0.035098525            80.00        0.019923543             0.00        0.000000000
6               Education          1800.00        0.504656861             0.00       0.0000000000             0.00        0.000000000             0.00        0.000000000            60.00        0.014942658             0.00        0.000000000
52               Electric             0.00        0.000000000             0.00       0.0000000000             0.00        0.000000000            32.88        0.008166144            31.91        0.007947003             0.00        0.000000000
7                   Gifts            10.00        0.002803649            89.00       0.0225375795           100.00        0.021825102            30.00        0.007450862            55.00        0.013697436            10.00        0.003242479
8               Groceries           233.33        0.065417547           372.68       0.0943742150           398.37        0.086944660           424.40        0.105404855           397.25        0.098932845           342.11        0.110928451
9                   Lunch           154.75        0.043386472           383.75       0.0971774847           326.25        0.071204396           412.00        0.102325166           486.40        0.121135144           291.00        0.094356141
37            Maintenance             0.00        0.000000000             0.00       0.0000000000           151.00        0.032955905           142.75        0.035453683           115.60        0.028789520            76.50        0.024804965
21       Medical Expenses             0.00        0.000000000           144.19       0.0365134111            29.95        0.006536618            86.55        0.021495736            47.08        0.011725005            66.80        0.021659760
22          Miscellaneous             0.00        0.000000000           508.11       0.1286693205           101.00        0.022043353          1051.50        0.261152698          1000.00        0.249044293          1008.00        0.326841890
10          Personal Care            30.00        0.008410948            30.00       0.0075969369            90.00        0.019642592            30.00        0.007450862           120.00        0.029885315            30.00        0.009727437
24                  Phone             0.00        0.000000000            38.40       0.0097240793            38.45        0.008391752            38.90        0.009661284            41.11        0.010238211            41.11        0.013329831
25             Recreation             0.00        0.000000000            81.75       0.0207016531            61.00        0.013313312            51.50        0.012790646           256.00        0.063755339           316.00        0.102462339
11                   Rent           545.00        0.152798883          1746.70       0.4423189903           743.75        0.162324199           749.70        0.186197031           761.60        0.189672133           765.00        0.248049649
95 Repair and Maintenance             0.00        0.000000000             0.00       0.0000000000             0.00        0.000000000             0.00        0.000000000             0.00        0.000000000            65.00        0.021076114
12         Transportation            32.50        0.009111860            35.00       0.0088630931           129.00        0.028154382            35.00        0.008692672            55.00        0.013697436            62.00        0.020103370
45                 Travel             0.00        0.000000000             0.00       0.0000000000           228.53        0.049876906             0.00        0.000000000             0.00        0.000000000             0.00        0.000000000

您也可以使用tidyverse

library(dplyr)
library(tidyr)

expensesByMonth %>% 
  gather(k, v, 3:4) %>% 
  unite(km, k, month) %>% 
  spread(km, v, fill = 0)

【讨论】:

【参考方案5】:

为此,我更喜欢 tables 包中的 tabulate 函数。它需要因素,但无论如何这对于您拥有的数据类型来说是一个好主意。

library(tables)
expensesByMonth$month= as.factor(expensesByMonth$month)
expensesByMonth$expense_type= as.factor(expensesByMonth$expense_type)
tabular(expense_type~(month)*(value+percent)*(sum),data=expensesByMonth)
# Optional formatting
tabular(expense_type~month*
   ((Format(digits=1))*value+(Format(digits=3))*percent)*sum,
   data=expensesByMonth)

部分输出:

                       value      percent  value      percent  value      percent 
expense_type           sum        sum      sum        sum      sum        sum     
Adjustment              442       0.124025    2       0.000506   16       0.003573
Bank Service Charge     200       0.056073  200       0.050646  200       0.043650
Cable                    21       0.005980   36       0.009200    0       0.000000

【讨论】:

有些 () 是多余的,但有时我会保留这些以防我想添加更多术语。 我以前从未使用过这门课。如果这不是太多要求,您能否提供一个如何从中提取数据的示例? 这不是单行的。该软件包的作者 Duncan Murdoch 编写了一个与 tables 软件包一起提供的漂亮插图。请参阅包文档/指南和小插图概述。 所以没有简单的方法来恢复过程以防万一您需要(假设)?感谢您的回答。 不,那是 reshape (base) 的域和 reshape 包中的函数。表格是纯显示。

以上是关于将多个值列重塑为宽格式的主要内容,如果未能解决你的问题,请参考以下文章

如何将多个变量的重复测量值传播为宽格式?

TSQL Pivot 4值列

以特定方式转换为宽格式

Mysql,重塑数据从长/高到宽

将时间序列数据从宽格式重塑为高格式(用于绘图)[重复]

数据规整:聚合合并和重塑