获取连续预测变量的 F 比和 p 值
Posted
技术标签:
【中文标题】获取连续预测变量的 F 比和 p 值【英文标题】:Obtain F-ratio and p-value for the continuous predictor 【发布时间】:2021-12-15 10:17:15 【问题描述】:我拟合了一个线性模型,其中块作为固定因子,加上 2 个分类和 1 个连续预测变量。我想要一个 III 型 ANCOVA 表,其中所有效果均按块计算。
dput(rye)
structure(list(strain = c("S23", "S23", "S23", "S23", "S23",
"S23", "S23", "S23", "NZ", "NZ", "NZ", "NZ", "NZ", "NZ", "NZ",
"NZ", "X", "X", "X", "X", "X", "X", "X", "X", "Kent", "Kent",
"Kent", "Kent", "Kent", "Kent", "Kent", "Kent"), manure = c("H",
"H", "H", "H", "A", "A", "A", "A", "H", "H", "H", "H", "A", "A",
"A", "A", "H", "H", "H", "H", "A", "A", "A", "A", "H", "H", "H",
"H", "A", "A", "A", "A"), block = c(1, 2, 3, 4, 1, 2, 3, 4, 1,
2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,
3, 4), yield = c(299, 318, 284, 279, 247, 202, 171, 183, 315,
247, 289, 307, 257, 175, 188, 174, 403, 439, 355, 324, 222, 170,
192, 176, 382, 353, 383, 310, 233, 246, 200, 143), moisture = c(65.4073415007189,
37.0145280041042, 73.2225001374652, 39.9941837349335, 74.803410076096,
42.8914147357587, 50.792780124357, 55.0153723560264, 47.217016572995,
62.3885361519854, 53.7388755272386, 24.6856936491391, 34.8364200180523,
37.9399805638271, 37.7866881025361, 58.1848457395229, 39.2165119122411,
45.0354704343593, 55.1876133744328, 42.272547076364, 61.2191532302273,
62.5368880571047, 36.1336423251218, 40.8096323034628, 23.8425007638943,
55.7644071035274, 66.9264524519492, 49.8050708164737, 60.5314496784137,
82.4221025517919, 52.8870034752968, 54.0634811725579)), row.names = c(NA,
-32L), spec = structure(list(cols = list(strain = structure(list(), class = c("collector_character",
"collector")), manure = structure(list(), class = c("collector_character",
"collector")), block = structure(list(), class = c("collector_double",
"collector")), yield = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x7ff0af81bf90>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
rye_lm <- lm(yield ~ block + strain*manure*moisture, data = rye)
我可以获得连续预测变量的 F 值和 P 值的唯一方法,并且与分类变量的交互项是 joint_tests(rye_lm)
,它给出了
model term df1 df2 F.ratio p.value
block 1 15 20.144 0.0004
strain 3 15 3.742 0.0345
manure 1 15 144.076 <.0001
moisture 1 15 0.175 0.6820
strain:manure 3 15 6.001 0.0068
strain:moisture 3 15 1.554 0.2419
manure:moisture 1 15 1.128 0.3050
strain:manure:moisture 3 15 0.567 0.6452
类似于why the results from the joint_tests function (emmeans package) do not show one of the interactions of the model?这里的建议
使用这些代码为分类预测变量提供一个方差分析表
rye_emm <- emmeans(rye_lm, c("strain", "manure", "moisture"))
joint_tests(rye_emm)
model term df1 df2 F.ratio p.value
strain 3 15 3.966 0.0289
manure 1 15 162.312 <.0001
strain:manure 3 15 6.178 0.0060
如何更改我的代码,以便在没有块的情况下获得 2 个分类变量、连续预测变量及其交互作用的 F 比和 p 值?非常感谢!!
【问题讨论】:
尝试基于 emmeans 对除块之外的每个预测变量进行此操作,使用at
指定 2 个不同的连续级别。
非常感谢您的提示。这有效emmeans(rye_lm , ~ strain * manure | moisture, at = list(moisture = c(40, 55)))
。只是一个澄清问题:为什么我们指定 2 个水平的连续变量?我注意到指定 3 个级别会向 ANOVA 表 d: df1 reduced due to linear dependence
返回一个附加列。
您需要有 2 个值而不是 1,因为它基于创建级别之间的对比,因此效果是可量化的。如果您指定 3 个级别,它会起作用,但会过度确定它并因此导致该消息。此外,如果您使用 > 2 个不等间距的级别,它会产生偏差。
【参考方案1】:
rye_emm2 <- emmeans(rye_lm , ~ strain * manure | moisture, at = list(moisture = c(40, 55)))
joint_tests(rye_emm2)
model term df1 df2 F.ratio p.value
strain 3 15 4.337 0.0217
manure 1 15 169.648 <.0001
moisture 1 15 0.175 0.6820
strain:manure 3 15 5.847 0.0075
strain:moisture 3 15 1.554 0.2419
manure:moisture 1 15 1.128 0.3050
strain:manure:moisture 3 15 0.567 0.6452
【讨论】:
以上是关于获取连续预测变量的 F 比和 p 值的主要内容,如果未能解决你的问题,请参考以下文章
回归(regression)与分类(classification)的区别