R:reshape2 long to wide 用 1 到 3 之间的整数替换实数值

Posted

技术标签:

【中文标题】R:reshape2 long to wide 用 1 到 3 之间的整数替换实数值【英文标题】:R: reshape2 long to wide replacing real values with integers between 1 and 3 【发布时间】:2015-07-06 14:20:59 【问题描述】:

由于我是第一次发帖,请多多包涵。我正在尝试将时间序列数据从长格式转换为宽格式,但是 reshape2(和 reshape)没有输出我想要的。我正在尝试使用 cast 或 dcast 将我的数据转换为以下格式

id State contract.type Q1.2011 Q2.2011 ... Q2.2014

源数据标题为 Med,格式如下:

    > dput(head(Med))
structure(list(State = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("AK", 
"AL", "AR", "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", 
"IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", "ME", "MI", 
"MN", "MO", "MS", "MT", "NC", "ND", "NE", "NH", "NJ", "NM", "NV", 
"NY", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", 
"VA", "VT", "WA", "WI", "WV", "WY"), class = "factor"), Rebate.Category = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("FFS", "MCO"), class = "factor"), 
    Qtr.Yr = structure(1:6, .Label = c("Q1.2011", "Q2.2011", 
    "Q3.2011", "Q4.2011", "Q1.2012", "Q2.2012", "Q3.2012", "Q4.2012", 
    "Q1.2013", "Q2.2013", "Q3.2013", "Q4.2013", "Q1.2014", "Q2.2014", 
    "Q3.2014", "Q4.2014", "Q1.2015", "Q2.2015"), class = c("ordered", 
    "factor")), NDC = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("30", 
    "64"), class = "factor"), Medicaid.Units = structure(c(290L, 
    306L, 320L, 228L, 162L, 320L), .Label = .Label = c("0.00", "4,010.00", 
    "4,076.00", "4,080.00", "4,084.00", "4,081.00", "4,089.00", 
    "4,091.00", "4,446.00", "4,440.00", "4,100.00", "4,104.00", 
    "4,151.00", "4,160.00", "4,161.00", "4,410.00", "4,414.00", 
    "4,418.00", "4,444.00", "4,451.00", "4,480.00", "4,488.00", 
    "4,440.00", "4,488.00", "4,500.00", "4,510.00", "4,558.00", 
    "4,560.00", "4,571.00", "4,600.00", "4,604.00", "4,610.00", 
    "4,678.00", "4,680.00", "4,740.00", "4,770.00", "4,800.00", 
    "4,850.00", "4,860.00", "4,910.00", "4,946.00", "4,960.00", 
    "4,971.00", "40,014.00", "40,440.00", "40,484.00", "40,166.00", 
    "40,180.00", "40,480.00", "40,500.00", "40,618.00", "40,740.00", 
    "40,770.00", "40,817.00", "404,460.00", "409,010.00", "44,406.00", 
    "44,440.00", "44,460.00", "44,510.00", "44,560.00", "44,580.00", 
    "44,700.00", "44,760.00", "44,841.00", "44,880.00", "44,940.00", 
    "44,948.00", "41,080.00", "41,400.00", "41,556.00", "41,600.00", 
    "41,780.00", "41,900.00", "41,960.00", "410.00", "414.00", 
    "44,010.00", "44,440.00", "44,751.00", "44,860.00", "44,880.00", 
    "44,040.00", "44,180.00", "44,880.00", "44,891.00", "45,000.00", 
    "45,484.00", "45,740.00", "45.50", "450.00", "451.00", "46,080.00", 
    "46,101.00", "46,160.00", "46,441.00", "46,560.00", "47,580.00", 
    "48,046.00", "48,060.00", "48,178.00", "48,846.00", "48,949.44", 
    "480.00", "49,440.00", "1,018.00", "1,046.00", "1,040.00", 
    "1,080.00", "1,400.00", "1,441.00", "1,460.00", "1,110.00", 
    "1,180.00", "1,446.00", "1,440.00", "1,491.00", "1,400.00", 
    "1,401.00", "1,460.00", "1,490.00", "1,491.00", "1,541.00", 
    "1,510.00", "1,511.00", "1,571.00", "1,640.00", "1,648.00", 
    "1,700.00", "1,741.00", "1,741.00", "1,760.00", "1,810.00", 
    "1,841.00", "1,846.00", "1,880.00", "1,896.00", "1,941.00", 
    "1,940.00", "1,941.00", "1,960.00", "1.00", "10,760.00", 
    "14,400.00", "14,600.00", "14,660.00", "144.11", "141,440.00", 
    "146.00", "11,680.00", "14,151.00", "14,510.00", "14,700.00", 
    "14,810.00", "14,891.00", "14.00", "140.00", "141.00", "168.68", 
    "17,460.00", "17,468.00", "170.00", "19,440.00", "4,000.00", 
    "4,004.00", "4,060.00", "4,071.00", "4,410.00", "4,450.00", 
    "4,480.00", "4,496.00", "4,188.00", "4,400.00", "4,441.00", 
    "4,410.00", "4,480.00", "4,600.00", "4,710.00", "4,711.00", 
    "4,840.00", "4,900.00", "4,941.00", "4,960.00", "4,964.00", 
    "40,741.00", "40,904.00", "40.00", "400.00", "404.00", "441.00", 
    "41,446.00", "44,000.00", "44,060.00", "44,746.00", "44,860.00", 
    "44,980.00", "45,400.00", "460.00", "47,160.00", "47,740.00", 
    "494.49", "4,010.00", "4,041.00", "4,080.00", "4,441.00", 
    "4,160.00", "4,410.00", "4,440.00", "4,610.00", "4,718.00", 
    "4,740.00", "4,768.00", "4,800.00", "4,910.00", "4,964.00", 
    "410.00", "415.44", "44,480.00", "44,740.00", "446.00", "444.00", 
    "45,418.00", "45,688.00", "46,418.00", "47,141.00", "47,180.00", 
    "48,410.00", "480.00", "484.00", "49,946.00", "5,040.00", 
    "5,400.00", "5,414.00", "5,460.00", "5,110.00", "5,180.00", 
    "5,440.00", "5,444.00", "5,441.00", "5,640.00", "5,760.00", 
    "5,794.00", "5,810.00", "5,946.00", "5,970.00", "50,810.00", 
    "51,480.00", "510.00", "540.00", "55,540.00", "56,110.00", 
    "57,661.00", "580.07", "588.91", "6,000.00", "6,060.00", 
    "6,410.00", "6,480.00", "6,140.00", "6,180.00", "6,400.00", 
    "6,480.00", "6,600.00", "6,646.00", "6,690.00", "6,710.00", 
    "6,900.00", "6.00", "60.00", "600.00", "64,044.00", "614.00", 
    "64,100.00", "660.00", "67,481.00", "690.00", "7,100.00", 
    "7,410.00", "7,451.00", "7,480.00", "7,500.00", "7,680.00", 
    "7,740.00", "7,760.00", "7,800.00", "7,860.00", "7,980.00", 
    "70,086.00", "70,680.00", "710.00", "74,800.00", "74,911.00", 
    "748.48", "751.00", "780.00", "784.00", "8,040.00", "8,460.00", 
    "8,510.00", "8,541.00", "8,584.00", "8,640.00", "8,740.00", 
    "8,880.00", "8,940.00", "840.00", "845.00", "9,000.00", "9,046.00", 
    "9,140.00", "9,400.00", "9,410.00", "9,480.00", "9,600.00", 
    "9,646.00", "9,660.00", "9,710.00", "90,140.00", "90.00", 
    "900.00", "905.00", "91,814.00", "960.00", "984.00", "996.77", 
    "0.50", "4,019.00", "4,044.00", "4,440.00", "4,480.00", "4,144.00", 
    "4,180.00", "4,400.00", "4,414.00", "4,468.00", "4,504.00", 
    "4,546.00", "4,551.00", "4,648.00", "4,710.00", "4,780.00", 
    "4,840.00", "4,980.00", "4.78", "40,080.00", "40,440.00", 
    "40,680.00", "40,851.00", "40,941.00", "40,914.00", "40,976.00", 
    "40,998.00", "40.00", "407,460.00", "44,044.00", "44,490.00", 
    "44,116.00", "44,176.00", "44,640.00", "44,810.00", "41,000.00", 
    "41,480.00", "41,140.00", "41,410.00", "41,480.00", "41,481.00", 
    "41,606.00", "41,841.00", "41,880.00", "41,908.00", "44,440.00", 
    "44,468.00", "44,680.00", "44,110.00", "44,446.00", "44,440.00", 
    "46,410.00", "46,680.00", "460.46", "465.00", "47,640.00", 
    "47,700.00", "47,810.00", "48,600.00", "48,900.00", "49,680.00", 
    "491.00", "495.00", "499.58", "1,048.00", "1,081.00", "1,410.00", 
    "1,456.00", "1,474.00", "1,476.00", "1,100.00", "1,114.00", 
    "1,184.00", "1,186.00", "1,456.00", "1,441.00", "1,484.00", 
    "1,580.00", "1,844.00", "1,941.00", "1.66", "10,150.00", 
    "10,181.00", "14,144.00", "11,440.00", "14,110.00", "14,404.00", 
    "144.09", "15,764.00", "154.45", "155.00", "16,580.00", "18,500.00", 
    "19,460.00", "19,510.00", "4,486.00", "4,108.00", "4,140.00", 
    "4,168.00", "4,408.00", "4,460.00", "4,464.00", "4,471.00", 
    "4,540.00", "4,660.00", "4,781.00", "40,061.00", "40,484.00", 
    "405.00", "44,447.00", "44,900.00", "441.00", "44,544.00", 
    "45,468.00", "454.87", "461.00", "47,080.00", "471.44", "484.00", 
    "496.78", "4,161.00", "4,480.00", "4,541.00", "4,560.00", 
    "4,571.00", "40,160.00", "40.00", "41,144.00", "414.00", 
    "418.00", "44,100.00", "44,400.00", "44,504.00", "46,500.00", 
    "46,860.00", "46,980.00", "47,445.00", "47,641.00", "47,880.00", 
    "470.00", "48,900.00", "484.00", "488.00", "49,740.00", "5,046.00", 
    "5,400.00", "5,484.00", "5,804.00", "5,880.00", "5,948.00", 
    "504.00", "51,065.00", "55,570.00", "55,680.00", "56,510.00", 
    "564.00", "57,468.00", "57,180.00", "584.00", "59,510.00", 
    "6,164.00", "6,460.00", "6,410.00", "6,660.00", "6,756.00", 
    "6,941.00", "6,948.00", "656.00", "680.00", "696.00", "7,080.00", 
    "7,441.00", "7,440.00", "7,146.00", "7,160.00", "7,456.00", 
    "7,560.00", "7,998.00", "744.00", "748.00", "8,400.00", "8,180.00", 
    "8,446.00", "8,446.00", "8,514.00", "8,580.00", "8,581.00", 
    "8,674.00", "8,700.00", "8,760.00", "8,810.00", "84,140.00", 
    "84,900.00", "840.00", "84.08", "87,410.00", "880.00", "884.00", 
    "89,700.00", "9,060.00", "9,064.00", "9,444.00", "9,664.00", 
    "9,780.00", "9,840.00", "9,900.00", "964.00", "0.89", "0.96", 
    "4,090.00", "4,451.00", "4,470.00", "4,106.00", "4,140.00", 
    "4,484.00", "4,418.00", "4,444.00", "4,444.00", "4,588.00", 
    "4,681.00", "4,718.00", "4,856.00", "4,891.00", "4,944.00", 
    "4.50", "40,068.00", "40,070.00", "40,488.00", "40,149.00", 
    "40,160.00", "40,410.00", "40,496.00", "40,596.00", "40,768.00", 
    "40,860.00", "40,916.00", "44,087.00", "44,400.00", "44,460.00", 
    "44,596.00", "44,646.00", "44,858.00", "44,881.00", "441.00", 
    "41,444.00", "41,840.00", "44,440.00", "44,100.00", "44,500.00", 
    "446.00", "44,496.00", "44,960.00", "444.00", "45,448.00", 
    "45,140.00", "45,456.00", "45,780.00", "45.00", "46,010.00", 
    "46,444.00", "46,941.00", "47,040.00", "47,466.00", "47,110.00", 
    "48,000.00", "48,049.00", "48,640.00", "48,645.00", "49,411.00", 
    "1,041.00", "1,041.00", "1,044.00", "1,094.00", "1,404.00", 
    "1,480.00", "1,171.00", "1,181.00", "1,188.00", "1,404.00", 
    "1,441.00", "1,546.00", "1,670.00", "1,818.00", "10.00", 
    "14,960.00", "11,010.00", "11,480.00", "11,488.00", "11,910.00", 
    "14,181.00", "14,490.00", "14,644.00", "15,100.00", "16,180.00", 
    "17,446.00", "18,946.00", "180.00", "19,460.00", "19,688.00", 
    "4,418.00", "4,481.00", "4,780.00", "4,946.00", "4.00", "41,040.00", 
    "41,400.00", "4,440.00", "4,100.00", "4,500.00", "4,580.00", 
    "4,680.00", "4,840.00", "4,946.00", "4,960.00", "4,980.00", 
    "4,981.00", "45,400.00", "476.00", "484.00", "486.00", "496.00", 
    "5,470.00", "5,510.00", "5,580.00", "5,688.00", "5,700.00", 
    "5,764.00", "5,940.00", "5,980.00", "54,491.00", "54,611.00", 
    "548.00", "551.00", "56.00", "6,078.00", "6,090.00", "6,446.00", 
    "6,464.00", "6,461.00", "6,474.60", "6,616.00", "610.00", 
    "611.00", "68,040.00", "7,010.00", "7,044.00", "7,058.00", 
    "7,084.00", "7,441.00", "7,440.00", "7,470.00", "7,646.00", 
    "7,676.00", "7,685.00", "7,696.58", "7,871.00", "714.00", 
    "718.00", "746.00", "75,540.00", "8,154.00", "8,761.00", 
    "8,866.00", "8.00", "9,040.00", "9,480.00", "9,414.47", "9,460.00", 
    "9,541.00", "9,540.00", "9,788.00", "9,810.00", "944.00", 
    "944.00", "4,064.00", "4,140.00", "4,450.00", "4,476.00", 
    "4,496.00", "4,591.00", "4,786.00", "4,941.00", "4,951.00", 
    "4.01", "40,010.00", "40,018.00", "40,196.00", "40,560.00", 
    "40,610.00", "40,764.00", "40,800.00", "40,980.00", "44,040.00", 
    "44,441.00", "44,591.00", "41,541.00", "44,441.00", "44,414.00", 
    "44,700.00", "44,950.00", "44,400.00", "44,704.00", "440.00", 
    "45,491.00", "45,450.00", "45,510.00", "45,788.00", "46,480.00", 
    "46,508.00", "46,571.00", "46,614.00", "46,860.00", "47,884.00", 
    "48,418.00", "49,500.00", "1,101.00", "1,451.00", "1,551.00", 
    "1,584.00", "1,676.00", "1,708.00", "1,984.00", "14,480.00", 
    "14,710.00", "14,400.00", "14,460.00", "15,500.00", "160.00", 
    "17,540.00", "17,847.00", "18,480.00", "4,068.00", "4,490.00", 
    "4,494.00", "4,496.00", "4,648.00", "4,704.00", "4,748.00", 
    "4,760.00", "4,804.00", "4,840.60", "4,867.76", "4,904.19", 
    "4,947.47", "41,880.00", "446.00", "46,691.00", "48,811.00", 
    "4,050.00", "4,409.81", "4,108.00", "4,500.45", "4,860.00", 
    "5,408.00", "5,441.45", "5,586.00", "5,944.00", "5,991.00", 
    "5.10", "6,061.00", "6,541.00", "6,771.00", "6,776.00", "6,780.00", 
    "641.00", "7,086.00", "7,444.00", "7,456.00", "7,541.00", 
    "7,610.00", "7,614.00", "7,644.00", "7,910.00", "7.76", "716.00", 
    "740.00", "8,188.00", "8,696.00", "8,740.00", "8,784.00", 
    "8,850.00", "860.00", "88.00", "896.00", "9,410.00", "9,450.00", 
    "9,141.00", "9,168.00", "9,404.00", "9,471.00", "9,661.00", 
    "9,964.00"), class = "factor"), id = c(1, 1, 1, 1, 1, 1)), .Names = c("State", 
"Rebate.Category", "Qtr.Yr", "NDC", "Medicaid.Units", "id"), row.names = c("2185", 
"2184", "2182", "2180", "1503", "1501"), class = "data.frame")

id 只是一个行号。我通过 NDC 将子集分为两个子集 Med1 和 Med2。然后我使用以下代码进行转换。如果我使用等式左侧包含的 id 进行投射,我会得到一些正确的数字。例如,AK 在 2014 年第 4 季度有 120 个,但我没有按季度将每个州/回扣类别对的数据放入一行,而是每行有一个很好的数字,其余时间点有 NA。如果我在 LHS 上不带 id 进行投射,那么我会为每个单元格中的整个工作表填写一个介于 0 和 5 之间的整数。

 TMed1<-dcast(Med1,id+Rebate.Category+State~Qtr.Yr,value.var="Medicaid.Units",drop=FALSE)

输出是

head(TMed1)
  id Rebate.Category State Q1.2011 Q2.2011 Q3.2011 Q4.2011 Q1.2012 Q2.2012
  1  1      FFS       AK    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
  2  1      FFS       AL    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
  3  1      FFS       AR    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
  4  1      FFS       AZ    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
  5  1      FFS       CA    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
  6  1      FFS       CO    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
Q3.2012 Q4.2012 Q1.2013 Q2.2013 Q3.2013 Q4.2013 Q1.2014 Q2.2014 Q3.2014
1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
  Q4.2014 Q1.2015 Q2.2015
1  120.00    <NA>    <NA>
2    <NA>    <NA>    <NA>
3    <NA>    <NA>    <NA>
4    <NA>    <NA>    <NA>
5    <NA>    <NA>    <NA>
6    <NA>    <NA>    <NA>

我还使用 plyr id 函数为状态和回扣组合的每个组合创建了一个唯一的 id,但我最终又得到了低整数。有谁知道如何让 medicaid 列的值按季度变化以获取唯一 ID 组合?

编辑:推荐用原始样本替换 dput 样本

【问题讨论】:

如果您 (1) 发布一个 small 示例数据集来说明您的数据,并且 (2) 显示您期望的输出示例,则您更有可能获得有意义的帮助得到。 您应该使用命令dput(head(Med)) 以其他人可以使用的格式发布您的数据。 我要做的第一件事就是将“因子”变量转换为“字符”向量。 @topsig,非常感谢您的推荐。我已经编辑了。 【参考方案1】:

你需要做两件事:

    Medicaid.Units 转换为数字。目前是817级的因子。

    在每个州/季度/类别/id有多个条目的情况下设置聚合函数。

    Med$Medicaid.Units = as.numeric(as.character(Med$Medicaid.Units)) TMed1

【讨论】:

我已经仔细检查了你的建议,但我没有成功,我以前也尝试过强制转换为数字和整数,但没有成功。我在正确的位置得到了很多 0 而不是数据,但至少数据没有被强制转换为带有 sum 的随机整数。我的 id 由 Med 分配 【参考方案2】:

我找到了答案!答案是用户 1362215、BondedDust 的建议和修补的组合。答案是 fun.aggregate 需要设置为 sum,而且从 excel 导出的 csv 以及我正在阅读的 csv 会导致 Medicaid.Units 被作为一个因素被读入,而它应该被读入整数。我转到 Excel 并重新导出为数字字段(不带逗号),它以整数形式读入,并与 user1362215 的代码一起正常工作。以前,只要数字有逗号,他/她的代码就会导致生成 NA,否则单元格是正确的并且在正确的位置。删除逗号并使用 fun.aggregate=sum 解决了这个问题。

谢谢大家!!! (如果我的帐户足够老可以拥有该特权,我会在适当的地方投票)

【讨论】:

以上是关于R:reshape2 long to wide 用 1 到 3 之间的整数替换实数值的主要内容,如果未能解决你的问题,请参考以下文章

R语言使用reshape2包的dcast函数将dataframe从长表到宽表(Long- to wide-format)指定单个标识符表格转化的时候值不唯一设置聚合函数(均值)

R语言使用reshape2包的melt函数将dataframe从宽表到长表(Wide- to long-format)指定行标识符变量并自定义生成的长表的标识符列的名称

R语言使用reshape函数将dataframe数据从长表变换为宽表(long format to wide format)

R语言使用tidyr包的spread函数将dataframe数据从长表变换为宽表(long format to wide format)

R语言将dataframe数据从宽表(wide)变为长表(long)实战:tidyr包的gather函数cdata包的unpivot_to_blocks函数data.table使用melt函数

R语言将dataframe数据从长表(long)变为宽表(wide)实战:tidyr包的spread函数cdata包的pivot_to_rowrecs函数data.table包dcast函数