获取连续预测变量的 F 比和 p 值

Posted

技术标签:

【中文标题】获取连续预测变量的 F 比和 p 值【英文标题】:Obtain F-ratio and p-value for the continuous predictor 【发布时间】:2021-12-15 10:17:15 【问题描述】:

我拟合了一个线性模型,其中块作为固定因子,加上 2 个分类和 1 个连续预测变量。我想要一个 III 型 ANCOVA 表,其中所有效果均按块计算。

dput(rye)
structure(list(strain = c("S23", "S23", "S23", "S23", "S23", 
"S23", "S23", "S23", "NZ", "NZ", "NZ", "NZ", "NZ", "NZ", "NZ", 
"NZ", "X", "X", "X", "X", "X", "X", "X", "X", "Kent", "Kent", 
"Kent", "Kent", "Kent", "Kent", "Kent", "Kent"), manure = c("H", 
"H", "H", "H", "A", "A", "A", "A", "H", "H", "H", "H", "A", "A", 
"A", "A", "H", "H", "H", "H", "A", "A", "A", "A", "H", "H", "H", 
"H", "A", "A", "A", "A"), block = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 
2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
3, 4), yield = c(299, 318, 284, 279, 247, 202, 171, 183, 315, 
247, 289, 307, 257, 175, 188, 174, 403, 439, 355, 324, 222, 170, 
192, 176, 382, 353, 383, 310, 233, 246, 200, 143), moisture = c(65.4073415007189, 
37.0145280041042, 73.2225001374652, 39.9941837349335, 74.803410076096, 
42.8914147357587, 50.792780124357, 55.0153723560264, 47.217016572995, 
62.3885361519854, 53.7388755272386, 24.6856936491391, 34.8364200180523, 
37.9399805638271, 37.7866881025361, 58.1848457395229, 39.2165119122411, 
45.0354704343593, 55.1876133744328, 42.272547076364, 61.2191532302273, 
62.5368880571047, 36.1336423251218, 40.8096323034628, 23.8425007638943, 
55.7644071035274, 66.9264524519492, 49.8050708164737, 60.5314496784137, 
82.4221025517919, 52.8870034752968, 54.0634811725579)), row.names = c(NA, 
-32L), spec = structure(list(cols = list(strain = structure(list(), class = c("collector_character", 
"collector")), manure = structure(list(), class = c("collector_character", 
"collector")), block = structure(list(), class = c("collector_double", 
"collector")), yield = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x7ff0af81bf90>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))  

rye_lm <- lm(yield ~ block + strain*manure*moisture, data = rye)

我可以获得连续预测变量的 F 值和 P 值的唯一方法,并且与分类变量的交互项是 joint_tests(rye_lm) ,它给出了

 model term             df1 df2 F.ratio p.value
 block                    1  15  20.144  0.0004
 strain                   3  15   3.742  0.0345
 manure                   1  15 144.076  <.0001
 moisture                 1  15   0.175  0.6820
 strain:manure            3  15   6.001  0.0068
 strain:moisture          3  15   1.554  0.2419
 manure:moisture          1  15   1.128  0.3050
 strain:manure:moisture   3  15   0.567  0.6452  

类似于why the results from the joint_tests function (emmeans package) do not show one of the interactions of the model?这里的建议

使用这些代码为分类预测变量提供一个方差分析表

rye_emm <- emmeans(rye_lm, c("strain", "manure", "moisture"))
joint_tests(rye_emm)
 model term    df1 df2 F.ratio p.value
 strain          3  15   3.966  0.0289
 manure          1  15 162.312  <.0001
 strain:manure   3  15   6.178  0.0060

如何更改我的代码,以便在没有块的情况下获得 2 个分类变量、连续预测变量及其交互作用的 F 比和 p 值?非常感谢!!

【问题讨论】:

尝试基于 emmeans 对除块之外的每个预测变量进行此操作,使用 at 指定 2 个不同的连续级别。 非常感谢您的提示。这有效emmeans(rye_lm , ~ strain * manure | moisture, at = list(moisture = c(40, 55)))。只是一个澄清问题:为什么我们指定 2 个水平的连续变量?我注意到指定 3 个级别会向 ANOVA 表 d: df1 reduced due to linear dependence 返回一个附加列。 您需要有 2 个值而不是 1,因为它基于创建级别之间的对比,因此效果是可量化的。如果您指定 3 个级别,它会起作用,但会过度确定它并因此导致该消息。此外,如果您使用 > 2 个不等间距的级别,它会产生偏差。 【参考方案1】:
rye_emm2 <-  emmeans(rye_lm , ~ strain * manure | moisture, at = list(moisture = c(40, 55)))

joint_tests(rye_emm2)
 model term             df1 df2 F.ratio p.value
 strain                   3  15   4.337  0.0217
 manure                   1  15 169.648  <.0001
 moisture                 1  15   0.175  0.6820
 strain:manure            3  15   5.847  0.0075
 strain:moisture          3  15   1.554  0.2419
 manure:moisture          1  15   1.128  0.3050
 strain:manure:moisture   3  15   0.567  0.6452

【讨论】:

以上是关于获取连续预测变量的 F 比和 p 值的主要内容,如果未能解决你的问题,请参考以下文章

回归(regression)与分类(classification)的区别

spss回归分析的F检验值

分类预测与回归模型介绍

您如何使用 Scikit learn 预测分类变量和连续变量的组合?

挖掘建模

R线性回归模型构建:残差值回归值预测域置信区间