给定一些条件数据，是不是可以从 R 中的条件密度中进行采样？

Posted 2023-02-14

技术标签:

【中文标题】给定一些条件数据，是不是可以从 R 中的条件密度中进行采样？【英文标题】：Is it possible to sample from a conditional density in R given some conditional data?给定一些条件数据，是否可以从 R 中的条件密度中进行采样？ 【发布时间】：2015-12-26 06:55:02 【问题描述】：

在 R 中，我使用 np 包创建了条件密度的带宽。我想做的是，给定一些新的条件向量，从结果分布中采样。

当前代码：

library('np')
# Generate some test data.
somedata = data.frame(replicate(10,runif(100, 0, 1)))
# Conditional variables.
X <- data.frame(somedata[, c('X1', 'X2', 'X3')])
# Dependent variables.
Y <- data.frame(somedata[, c('X4', 'X5', 'X6')])
# Warning, this can be slow (but shouldn't be too bad).
bwsome = npcdensbw(xdat=X, ydat=Y)
# TODO: Given some vector t of conditional data, how can I sample from the resulting distribution?

我对 R 很陌生，所以虽然我确实阅读了包文档，但我无法弄清楚我的愿景是否有意义或可能。如有必要，我很乐意使用不同的包。

【问题讨论】：

我得到：Error: could not find function "npcedensbw"。当我查看 np-package 中的可用函数时，我没有看到任何该名称。当我用npcdensbw 和plot 重新运行结果时，我看到了 6 X 变量。现在...问题到底是什么？确实，我正在处理多变量数据，包括条件变量和因变量。我想做的是从确定的分布中取样。给定条件/自变量的一些新向量，我想根据给定条件变量的分布进行采样。在一个更简单的例子中，如果 x 和 y 都是一维的，我想修复 x 以便在 y 上有一个分布，然后在该分布中采样。我想在这里做同样的事情。是不是更清楚了？只是为了确保我正确理解了这个问题：您的案例与cran.r-project.org/web/packages/np/vignettes/np_faq.pdf 中的常见问题解答 2.49 有何不同？所以，如果我理解正确.. 你想计算像 P(X4|X1) 之类的东西，或更复杂的... P(X5|X1,X2,X3)... 或甚至 P(X1|X4)... 这是正确的吗？ 【参考方案1】：

这是来自 https://cran.r-project.org/web/packages/np/vignettes/np_faq.pdf 的示例 2.49，它给出了以下内容 2个变量的解决方案：

###
library(np)
data(faithful)
n <- nrow(faithful)
x1 <- faithful$eruptions
x2 <- faithful$waiting
## First compute the bandwidth vector
bw <- npudensbw(~x1 + x2, ckertype = "gaussian")
plot(bw, view = "fixed", ylim = c(0, 3))
## Next generate draws from the kernel density (Gaussian)
n.boot <- 1000
i.boot <- sample(1:n, n.boot, replace = TRUE)
x1.boot <- rnorm(n.boot,x1[i.boot],bw$bw[1])
x2.boot <- rnorm(n.boot,x2[i.boot],bw$bw[2])
## Plot the density for the bootstrap sample using the original
## bandwidths
plot(npudens(~x1.boot+x2.boot,bws=bw$bw), view = "fixed")

根据@coffeejunky 的提示，以下是可能的用 6 个变量解决您的问题：

## Generate some test data.
somedata = data.frame(replicate(10, runif(100, 0, 1)))
## Conditional variables.
X <- data.frame(somedata[, c('X1', 'X2', 'X3')])
## Dependent variables.
Y <- data.frame(somedata[, c('X4', 'X5', 'X6')])
## First compute the bandwidth vector
n <- nrow(somedata)
bw <- npudensbw(~X$X1 + X$X2 + X$X3 + Y$X4 + Y$X5 + Y$X6, ckertype = "gaussian")
plot(bw, view = "fixed", ylim = c(0, 3))
## Next generate draws from the kernel density (Gaussian)
n.boot <- 1000
i.boot <- sample(1:n, n.boot, replace=TRUE)
x1.boot <- rnorm(n.boot, X$X1[i.boot], bw$bw[1])
x2.boot <- rnorm(n.boot, X$X2[i.boot], bw$bw[2])
x3.boot <- rnorm(n.boot, X$X3[i.boot], bw$bw[3])
x4.boot <- rnorm(n.boot, Y$X4[i.boot], bw$bw[4])
x5.boot <- rnorm(n.boot, Y$X5[i.boot], bw$bw[5])
x6.boot <- rnorm(n.boot, Y$X6[i.boot], bw$bw[6])
## Plot the density for the bootstrap sample using the original
## bandwidths
ob1 <- npudens(~x1.boot + x2.boot + x3.boot + x4.boot + x5.boot + x6.boot, bws = bw$bw)
plot(ob1, view = "fixed", ylim = c(0, 3))

【讨论】：

此示例从核无条件密度估计（使用npudensbw）中采样，但不是用于核条件密度估计（将使用npcdensbw）。也许有一种简单的方法可以调整此代码以尝试此操作，但我在np 帮助文件中没有看到它有很好的记录，并且会重视问题的这个特定方面的明确答案。

以上是关于给定一些条件数据，是不是可以从 R 中的条件密度中进行采样？的主要内容，如果未能解决你的问题，请参考以下文章