R:在扩展后使用summarise_all(funs(sum))返回0,即使删除了NAs
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R:在扩展后使用summarise_all(funs(sum))返回0,即使删除了NAs相关的知识,希望对你有一定的参考价值。
我遇到了一个总和问题
spread(FUND, PTD_BALANCE, fill = 0) %>%
summarise_all(funs(sum))
错误地为某些列中的所有值返回0。即使我在传播中允许NAs并在汇总中删除它们,也会发生这种情况。点差从原始的4列中提取25个变量。以下是一些我已经尝试无效的方法:
Budget_FY11_FY18 <- read.csv("FY_8yr_Adopted_Fund_Clean.csv",
colClasses = c(rep("factor",6), "double"))
MBudget_Mvar <- Budget_FY11_FY18 %>%
select(BUDGET_NAME, PERIOD_NAME, FUND, PTD_BALANCE) %>%
unite("FY_Month", BUDGET_NAME, PERIOD_NAME, remove = TRUE) %>%
group_by(FY_Month) %>%
mutate(i = row_number()) %>%
spread(FUND, PTD_BALANCE, fill = 0) %>%
summarise_all(funs(sum))
dput
的head
Budget_FY11_FY18
是(删除某些标签):
dput(head(Budget_FY11_FY18))
structure(list(BUDGET_NAME = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = c("FY11 ADOPTED", "FY12 ADOPTED", "FY13 ADOPTED",
"FY14 ADOPTED", "FY15 ADOPTED", "FY16 ADOPTED", "FY17 ADOPTED",
"FY18 ADOPTED"), class = "factor"), PERIOD_NUM = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("1", "10", "11", "12", "2", "3",
"4", "5", "6", "7", "8", "9"), class = "factor"), FUND = structure(c(6L,
6L, 6L, 6L, 6L, 6L), .Label = c(), class = "factor"),
SERVICE_CENTER = structure(c(223L, 223L, 223L, 223L, 223L,
223L), .Label = c(), class = "factor"), ACCOUNT = structure(c(3L,
5L, 359L, 202L, 203L, 371L), .Label = c(), class = "factor"),
PERIOD_NAME = structure(c(6L, 6L, 6L, 6L, 6L, 6L), .Label = c("April",
"August", "December", "February", "January", "July", "June",
"March", "May", "November", "October", "September"), class = "factor"),
PTD_BALANCE = c(-21895250, -650000, -435042, -4300000, -322908,
-513417)), .Names = c("BUDGET_NAME", "PERIOD_NUM", "FUND",
"SERVICE_CENTER", "ACCOUNT", "PERIOD_NAME", "PTD_BALANCE"), row.names = c(NA,
6L), class = "data.frame")
虽然我也尝试在character
中读取非数字列,导致以下dput
:
> dput(head(Budget_FY11_FY18))
structure(list(BUDGET_NAME = c("FY11 ADOPTED", "FY11 ADOPTED",
"FY11 ADOPTED", "FY11 ADOPTED", "FY11 ADOPTED", "FY11 ADOPTED"
), PERIOD_NUM = c("1", "1", "1", "1", "1", "1"), FUND = c("General Fund",
"General Fund", "General Fund", "General Fund", "General Fund",
"General Fund"), SERVICE_CENTER = c("Unallocated", "Unallocated",
"Unallocated", "Unallocated", "Unallocated", "Unallocated"),
ACCOUNT = c("Ad Valorem Tax - Current", "Ad Valorem Tax Prior",
"PILOT's", "In Lieu Of Taxes-Utils", "In Lieu Of Taxes-Sewer",
"Property Taxes Interest & Penalty"), PERIOD_NAME = c("July",
"July", "July", "July", "July", "July"), PTD_BALANCE = c(-21895250,
-650000, -435042, -4300000, -322908, -513417)), .Names = c("BUDGET_NAME",
"PERIOD_NUM", "FUND", "SERVICE_CENTER", "ACCOUNT", "PERIOD_NAME",
"PTD_BALANCE"), row.names = c(NA, 6L), class = "data.frame")
目前我已经加载了以下包:
[1] gmp_0.5-13.1 xts_0.10-1 MTS_0.33 zoo_1.8-1
[5] tseries_0.10-42 forecast_8.2 gridExtra_2.3 magrittr_1.5
[9] readr_1.1.1 ggplot2_2.2.1 bindrcpp_0.2 data.table_1.10.4-3
[13] stringr_1.2.0 tidyr_0.7.2 dplyr_0.7.4
我尝试了各种隔离方法。
附加背景:我正在尝试使用〜420k观测值对数据集进行扩散和求和,以准备分析作为多变量时间序列。数据属于数字级,范围从5400万到-200万。符号更改的原因是数据集代表预算。
任何帮助将不胜感激!
我最初认为该问题类似于之前回答的问题here和here中描述的问题。
尽管Error in as.character.factor(x) : malformed factor
和akrun正确地指出了Tung差异,但事实证明,在进一步检查我的原始数据时,原始代码实际上返回了正确的值,但显然使用代码可能会在其他地方产生问题。
就我的目的而言,方法中的缺陷是线性代数和模型选择之一,它发生在下游。问题中描述的操作产生的矩阵完全是单数。
我认为问题源于重塑和总结的假设是不正确的。
随后的讨论可能最好放在Cross Validated上,或者重新定义为关于畸形因素发生的问题。
如果确定它们对社区没有任何价值,则应删除此问题及其回答/“回答”。
以上是关于R:在扩展后使用summarise_all(funs(sum))返回0,即使删除了NAs的主要内容,如果未能解决你的问题,请参考以下文章
Kotlin 内联函数let,with,run,apply,also区别和用法
将参数传递给R中的多个match_fun函数fuzzyjoin::fuzzy_join