使用具有 2 个或更多水平的因素运行(混合模型)回归时,错误“对比只能应用于具有 2 个或更多水平的因素”
Posted
技术标签:
【中文标题】使用具有 2 个或更多水平的因素运行(混合模型)回归时,错误“对比只能应用于具有 2 个或更多水平的因素”【英文标题】:Error "contrasts can be applied only to factors with 2 or more levels" when running a (mixed model) regression with factors with 2 or more levels 【发布时间】:2022-01-19 11:18:36 【问题描述】:我正在尝试运行一个简单的线性回归模型,其中包括一个结果 (continuous_outcome
) 和两个用于吸烟的虚拟变量 (current_vs_neversmoking
和 former_vs_neversmoking
)。我之前将这两个变量组合为三个级别的一个因素,但是将一个级别与其他 2 个级别(即当前与非当前)进行比较,我想专门比较当前与从不以及以前与从不。
当我尝试运行模型时,我得到了错误
Error in `\contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
.
我的数据和代码如下:
mydata <- structure(list(pat_id = c(1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3,
3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 9, 9, 10,
10, 11, 11, 11, 11, 12, 12, 13, 13, 14, 14, 14, 14, 14, 15, 15,
16, 16, 17, 17, 17, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22,
22, 22, 22, 22, 23, 23, 24, 24, 24, 24, 24, 25, 25, 26, 26, 26,
26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 31, 31, 31, 32, 32,
33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 35, 36, 36, 36, 36,
36, 37, 37, 38, 38, 39, 39, 40, 40, 41, 41, 42, 42, 43, 43, 44,
44, 44, 44, 45, 45, 46, 46, 47, 47, 47, 47, 47, 48, 48, 48, 48,
48, 49, 49, 50, 50), continuous_outcome = c(0.270481901933073,
0.306562240871999, 0.586489601087521, 0.663162791491994, 0.696568742621393,
0.573238528525012, -1.50517834064486, -1.14239124190004, 0.602167001833233,
0.942169278018825, 0.957507525424839, 0.942401042208738, 1.10173901173947,
-1.23467796994225, -0.0205580225283486, -0.231308201295527, -0.244470432048288,
-0.256490437743765, 0.493465625373049, 0.406426360030117, 0.439098160535839,
0.466158747996811, 0.637429149477194, 0.0219441253328183, 0.102660112718747,
0.264537705164256, 0.110814584186878, 0.49920541931488, 1.81235625865717,
1.82870935879674, 0.652695891088804, 0.69291517381055, -0.414081564221917,
-0.147536404237028, 1.21903849053896, 1.06257819295167, 1.10222362013134,
1.13246743635661, -0.670943276171988, -0.29653504137582, 0.0590836540990421,
0.282795470829998, -3.03315551333956, -1.88568994249489, -1.65312212848836,
-1.13355891646777, -2.20351671143641, -1.45344735861464, -1.25516950174665,
-0.743390964862038, -0.4629610158192, 0.606862844948187, 0.639058684113426,
0.609702655264534, 0.633960970096869, 0.548906526787276, 0.108205702176247,
0.124050755621246, -0.881940114877928, -1.12908469428316, -1.48617053617301,
-1.45848671123536, 0.0944288383151997, 0.279125369127663, 0.489885538084724,
0.486578831616853, 0.394325240405338, 0.460090367906543, 0.937968466599025,
-1.20642488217955, -0.981185479943044, 0.570576924035185, 0.532219882463515,
0.620627645616656, 0.631553233135331, 0.874526189757774, -0.194145530051932,
-0.0979606735363465, 0.565800797611727, 0.509862625778819, 0.5741604159953,
0.519945775026426, 0.387595824059598, 0.395925960524675, -1.74473193173614,
-0.848779543387106, 1.41774732048115, 1.51159850388708, 0.462882007460068,
0.483950525664105, -0.366500414469296, -0.0920163339687414, -0.166351980885457,
-0.0860682256869157, -0.219608109715091, 0.195934077939654, 0.356018784590499,
0.484056029455595, 0.57498034210306, 0.572359796530477, 0.599809068756398,
0.542583937381158, 0.698337291640914, 0.740921504459827, 0.45772616988788,
0.405098691997856, 0.485871287409578, 0.442621726153633, 0.29123670436699,
0.0303617893266618, 0.00448603635822562, -0.0619887479801569,
0.003984369355659, -0.140521412371098, -0.971697227999586, -1.20190205773194,
-1.53965813080136, -1.30849790890586, 1.58558160520627, 1.61870389553583,
-5.84164915563387, -5.84164915563387, 0.777919475931911, 0.972720285314287,
0.477725719575478, 0.461105062597019, 0.616300922435037, 0.528825235299615,
0.752152176797313, 0.915416601798041, 0.906483121528581, 0.868345778494055,
-2.885534489146, -1.64736196365156, -0.768874512446897, -0.66979572486731,
0.73917509257953, 0.883831498985817, 0.884240158759821, 0.916187794016791,
1.38773159469184, -0.00127946509641595, 0.302272238178157, 0.340088450861561,
0.295163832020064, 0.94639364965826, 0.839369926698037, 0.913777832307086,
0.767222595331384, 0.898887351534535), current_vs_neversmoking = structure(c(NA,
NA, NA, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, NA, NA, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA,
2L, 2L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA,
2L, 2L, 2L, 2L, 2L, 1L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, 2L, NA, NA, 1L, 1L, NA,
NA, NA, NA, 1L, 1L, NA, NA, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, NA, NA, NA, NA), .Label = c("Never smoker", "Current smoker"
), class = "factor"), former_vs_neversmoking = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, NA, 2L, 2L, 1L, 1L, 1L,
1L, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
NA, NA, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
NA, NA, NA, NA, NA, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, 2L, 2L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L), .Label = c("Never smoker", "Former smoker"
), class = "factor")), row.names = c(NA, 150L), class = "data.frame")
summary(mydata)
pat_id continuous_outcome current_vs_neversmoking former_vs_neversmoking
Min. : 1.00 Min. :-5.8416 Never smoker :25 Never smoker :25
1st Qu.:11.25 1st Qu.:-0.2132 Current smoker:28 Former smoker:97
Median :24.00 Median : 0.4409 NA's :97 NA's :28
Mean :24.60 Mean : 0.0737
3rd Qu.:36.00 3rd Qu.: 0.6493
Max. :50.00 Max. : 1.8287
model_1 <- lm(formula=continuous_outcome ~ current_vs_neversmoking + former_vs_neversmoking,
data=mydata,
na.action="na.omit")
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
为什么会出现此错误?两个分类变量都被编码为因子,并且有 2 个级别...
【问题讨论】:
【参考方案1】:您的因子编码错误。
> table(mydata$current_vs_neversmoking, mydata$former_vs_neversmoking)
Never smoker Former smoker
Never smoker 25 0
Current smoker 0 0
显示唯一具有 non_na 值的行是 current_vs_neversmoking == 'Never smoker'
和 'ormer_vs_neversmoking == 'Never smoker'
' 的行。请注意,在估算模型时,您会丢弃所有带有 NA 的内容。
我相信您想将吸烟变量包含在一个因素中,但要重新编码,以便从不吸烟者成为基线。
mydata$smoker <- ifelse(is.na(mydata$current_vs_neversmoking), as.character(mydata$former_vs_neversmoking), as.character(mydata$current_vs_neversmoking))
mydata$smoker <- factor(mydata$smoker, levels=c("Never smoker", "Current smoker", "Former smoker"))
现在:
summary(model_1 <- lm(formula=continuous_outcome ~ smoker,
data=mydata,
na.action="na.omit"))
Call:
lm(formula = continuous_outcome ~ smoker, data = mydata, na.action = "na.omit")
Residuals:
Min 1Q Median 3Q Max
-5.8724 -0.2720 0.2926 0.5526 1.7979
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.4049 0.2161 -1.874 0.062948 .
smokerCurrent smoker 1.0545 0.2973 3.547 0.000523 ***
smokerFormer smoker 0.4356 0.2423 1.798 0.074257 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.08 on 147 degrees of freedom
Multiple R-squared: 0.08135, Adjusted R-squared: 0.06885
F-statistic: 6.509 on 2 and 147 DF, p-value: 0.001957
现在将现在和以前的吸烟者与从不吸烟者的基线进行比较。
【讨论】:
感谢您的回答。但是,在您提出的线性模型中,当前和以前的吸烟是否与非当前和非吸烟形成对比(并且从不与非从不包括在内,因为它是参考)?看来,如果您说的是真的(即两者都是与从不),那与我对虚拟变量所做的不一样吗?我想我不完全理解 r 如何计算线性模型的对比度,因为 如,我想对比当前与从不以及以前与从不。非当前与非当前以及以前与非以前。 您有三个相互排斥的类别:从不、现在和以前。在我上面的模型中,常数是从不组中的平均值,前一个和当前的系数分别是从不,前一个和当前之间的差异。这就是本教程中所谓的“虚拟编码”:marissabarlaz.github.io/portfolio/contrastcoding “如,我想对比当前与从不以及以前与从不。” 啊,我明白了,谢谢你的链接。这有帮助!以上是关于使用具有 2 个或更多水平的因素运行(混合模型)回归时,错误“对比只能应用于具有 2 个或更多水平的因素”的主要内容,如果未能解决你的问题,请参考以下文章
当“对比只能应用于具有 2 个或更多水平的因素”时如何进行 GLM?
使用 lm 构建回归模型时出错(`对比<-`中的错误(`*tmp*`...对比只能应用于具有2个或更多级别的因素)[重复]
尽管有两个水平,但对比只能应用于具有两个或更多水平的因素[重复]