使用具有 2 个或更多水平的因素运行(混合模型)回归时,错误“对比只能应用于具有 2 个或更多水平的因素”

Posted

技术标签:

【中文标题】使用具有 2 个或更多水平的因素运行(混合模型)回归时,错误“对比只能应用于具有 2 个或更多水平的因素”【英文标题】:Error "contrasts can be applied only to factors with 2 or more levels" when running a (mixed model) regression with factors with 2 or more levels 【发布时间】:2022-01-19 11:18:36 【问题描述】:

我正在尝试运行一个简单的线性回归模型,其中包括一个结果 (continuous_outcome) 和两个用于吸烟的虚拟变量 (current_vs_neversmokingformer_vs_neversmoking)。我之前将这两个变量组合为三个级别的一个因素,但是将一个级别与其他 2 个级别(即当前与非当前)进行比较,我想专门比较当前与从不以及以前与从不。

当我尝试运行模型时,我得到了错误 Error in `\contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels.

我的数据和代码如下:

mydata <- structure(list(pat_id = c(1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 
3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 9, 9, 10, 
10, 11, 11, 11, 11, 12, 12, 13, 13, 14, 14, 14, 14, 14, 15, 15, 
16, 16, 17, 17, 17, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 
22, 22, 22, 22, 23, 23, 24, 24, 24, 24, 24, 25, 25, 26, 26, 26, 
26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 31, 31, 31, 32, 32, 
33, 33, 33, 33, 33, 34, 34, 35, 35, 35, 35, 35, 36, 36, 36, 36, 
36, 37, 37, 38, 38, 39, 39, 40, 40, 41, 41, 42, 42, 43, 43, 44, 
44, 44, 44, 45, 45, 46, 46, 47, 47, 47, 47, 47, 48, 48, 48, 48, 
48, 49, 49, 50, 50), continuous_outcome = c(0.270481901933073, 
0.306562240871999, 0.586489601087521, 0.663162791491994, 0.696568742621393, 
0.573238528525012, -1.50517834064486, -1.14239124190004, 0.602167001833233, 
0.942169278018825, 0.957507525424839, 0.942401042208738, 1.10173901173947, 
-1.23467796994225, -0.0205580225283486, -0.231308201295527, -0.244470432048288, 
-0.256490437743765, 0.493465625373049, 0.406426360030117, 0.439098160535839, 
0.466158747996811, 0.637429149477194, 0.0219441253328183, 0.102660112718747, 
0.264537705164256, 0.110814584186878, 0.49920541931488, 1.81235625865717, 
1.82870935879674, 0.652695891088804, 0.69291517381055, -0.414081564221917, 
-0.147536404237028, 1.21903849053896, 1.06257819295167, 1.10222362013134, 
1.13246743635661, -0.670943276171988, -0.29653504137582, 0.0590836540990421, 
0.282795470829998, -3.03315551333956, -1.88568994249489, -1.65312212848836, 
-1.13355891646777, -2.20351671143641, -1.45344735861464, -1.25516950174665, 
-0.743390964862038, -0.4629610158192, 0.606862844948187, 0.639058684113426, 
0.609702655264534, 0.633960970096869, 0.548906526787276, 0.108205702176247, 
0.124050755621246, -0.881940114877928, -1.12908469428316, -1.48617053617301, 
-1.45848671123536, 0.0944288383151997, 0.279125369127663, 0.489885538084724, 
0.486578831616853, 0.394325240405338, 0.460090367906543, 0.937968466599025, 
-1.20642488217955, -0.981185479943044, 0.570576924035185, 0.532219882463515, 
0.620627645616656, 0.631553233135331, 0.874526189757774, -0.194145530051932, 
-0.0979606735363465, 0.565800797611727, 0.509862625778819, 0.5741604159953, 
0.519945775026426, 0.387595824059598, 0.395925960524675, -1.74473193173614, 
-0.848779543387106, 1.41774732048115, 1.51159850388708, 0.462882007460068, 
0.483950525664105, -0.366500414469296, -0.0920163339687414, -0.166351980885457, 
-0.0860682256869157, -0.219608109715091, 0.195934077939654, 0.356018784590499, 
0.484056029455595, 0.57498034210306, 0.572359796530477, 0.599809068756398, 
0.542583937381158, 0.698337291640914, 0.740921504459827, 0.45772616988788, 
0.405098691997856, 0.485871287409578, 0.442621726153633, 0.29123670436699, 
0.0303617893266618, 0.00448603635822562, -0.0619887479801569, 
0.003984369355659, -0.140521412371098, -0.971697227999586, -1.20190205773194, 
-1.53965813080136, -1.30849790890586, 1.58558160520627, 1.61870389553583, 
-5.84164915563387, -5.84164915563387, 0.777919475931911, 0.972720285314287, 
0.477725719575478, 0.461105062597019, 0.616300922435037, 0.528825235299615, 
0.752152176797313, 0.915416601798041, 0.906483121528581, 0.868345778494055, 
-2.885534489146, -1.64736196365156, -0.768874512446897, -0.66979572486731, 
0.73917509257953, 0.883831498985817, 0.884240158759821, 0.916187794016791, 
1.38773159469184, -0.00127946509641595, 0.302272238178157, 0.340088450861561, 
0.295163832020064, 0.94639364965826, 0.839369926698037, 0.913777832307086, 
0.767222595331384, 0.898887351534535), current_vs_neversmoking = structure(c(NA, 
NA, NA, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 2L, NA, NA, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 
2L, 2L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, 2L, NA, NA, 1L, 1L, NA, 
NA, NA, NA, 1L, 1L, NA, NA, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
1L, NA, NA, NA, NA), .Label = c("Never smoker", "Current smoker"
), class = "factor"), former_vs_neversmoking = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, NA, NA, NA, NA, NA, 2L, 2L, 1L, 1L, 1L, 
1L, NA, NA, NA, NA, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
NA, NA, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
NA, NA, NA, NA, NA, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, 2L, 2L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 2L, 2L, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L), .Label = c("Never smoker", "Former smoker"
), class = "factor")), row.names = c(NA, 150L), class = "data.frame")

summary(mydata)
     pat_id      continuous_outcome   current_vs_neversmoking   former_vs_neversmoking
 Min.   : 1.00   Min.   :-5.8416    Never smoker  :25         Never smoker :25        
 1st Qu.:11.25   1st Qu.:-0.2132    Current smoker:28         Former smoker:97        
 Median :24.00   Median : 0.4409    NA's          :97         NA's         :28        
 Mean   :24.60   Mean   : 0.0737                                                      
 3rd Qu.:36.00   3rd Qu.: 0.6493                                                      
 Max.   :50.00   Max.   : 1.8287                                                      

model_1 <- lm(formula=continuous_outcome ~ current_vs_neversmoking + former_vs_neversmoking, 
              data=mydata, 
              na.action="na.omit")

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

为什么会出现此错误?两个分类变量都被编码为因子,并且有 2 个级别...

【问题讨论】:

【参考方案1】:

您的因子编码错误。

> table(mydata$current_vs_neversmoking, mydata$former_vs_neversmoking)

                 Never smoker Former smoker
  Never smoker             25             0
  Current smoker            0             0

显示唯一具有 non_na 值的行是 current_vs_neversmoking == 'Never smoker' 和 'ormer_vs_neversmoking == 'Never smoker'' 的行。请注意,在估算模型时,您会丢弃所有带有 NA 的内容。

我相信您想将吸烟变量包含在一个因素中,但要重新编码,以便从不吸烟者成为基线。

mydata$smoker <- ifelse(is.na(mydata$current_vs_neversmoking), as.character(mydata$former_vs_neversmoking), as.character(mydata$current_vs_neversmoking))
mydata$smoker <- factor(mydata$smoker, levels=c("Never smoker", "Current smoker",  "Former smoker"))

现在:

summary(model_1 <- lm(formula=continuous_outcome ~ smoker, 
              data=mydata, 
              na.action="na.omit"))
Call:
lm(formula = continuous_outcome ~ smoker, data = mydata, na.action = "na.omit")

Residuals:
    Min      1Q  Median      3Q     Max
-5.8724 -0.2720  0.2926  0.5526  1.7979

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)
(Intercept)           -0.4049     0.2161  -1.874 0.062948 .
smokerCurrent smoker   1.0545     0.2973   3.547 0.000523 ***
smokerFormer smoker    0.4356     0.2423   1.798 0.074257 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.08 on 147 degrees of freedom
Multiple R-squared:  0.08135,   Adjusted R-squared:  0.06885
F-statistic: 6.509 on 2 and 147 DF,  p-value: 0.001957

现在将现在和以前的吸烟者与从不吸烟者的基线进行比较。

【讨论】:

感谢您的回答。但是,在您提出的线性模型中,当前和以前的吸烟是否与非当前和非吸烟形成对比(并且从不与非从不包括在内,因为它是参考)?看来,如果您说的是真的(即两者都是与从不),那与我对虚拟变量所做的不一样吗?我想我不完全理解 r 如何计算线性模型的对比度,因为 如,我想对比当前与从不以及以前与从不。非当前与非当前以及以前与非以前。 您有三个相互排斥的类别:从不、现在和以前。在我上面的模型中,常数是从不组中的平均值,前一个和当前的系数分别是从不,前一个和当前之间的差异。这就是本教程中所谓的“虚拟编码”:marissabarlaz.github.io/portfolio/contrastcoding “如,我想对比当前与从不以及以前与从不。” 啊,我明白了,谢谢你的链接。这有帮助!

以上是关于使用具有 2 个或更多水平的因素运行(混合模型)回归时,错误“对比只能应用于具有 2 个或更多水平的因素”的主要内容,如果未能解决你的问题,请参考以下文章

当“对比只能应用于具有 2 个或更多水平的因素”时如何进行 GLM?

使用 lm 构建回归模型时出错(`对比<-`中的错误(`*tmp*`...对比只能应用于具有2个或更多级别的因素)[重复]

尽管有两个水平,但对比只能应用于具有两个或更多水平的因素[重复]

使用混淆矩阵`data`和`reference`的错误应该是具有相同水平的因素

“对比”错误中的错误

R:RF模型中的混淆矩阵返回错误:data`和`reference`应该是具有相同水平的因素