图像的核密度估计

Posted 2023-03-12

技术标签:

【中文标题】图像的核密度估计【英文标题】：Kernel Density Estimation on an image 【发布时间】：2020-04-18 11:12:45 【问题描述】：

我有一组点 [x1,y1][x2,y2]...[xn,yn]。我需要使用 2D 图像中的核密度估计来显示它们。如何执行此操作？我指的是以下代码，这有点令人困惑。寻找一个简单的解释。

https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html

img = np.zeros((height, width), np.uint8)
circles_xy =[[524,290][234,180]...[432,30]]

kde = KernelDensity(bandwidth=1.0, kernel='gaussian')
kde.fit(circles_xy)

【问题讨论】：

【参考方案1】：

我将通过绘制核密度估计的 PDF 的轮廓继续沿着相同的路径。但是，这可能无法提供您需要的信息，因为 PDF 的值信息量不是很大。相反，我宁愿计算最小音量级别集。从给定的概率水平，最小水平集是包含该部分分布的域。如果我们考虑由 PDF 的给定值定义的域，则它必须对应于未知的 PDF 值。求这个PDF值的问题是通过反转来完成的。

基于给定的样本，自然的想法是根据内核平滑计算近似分布，就像您所做的那样。然后，对于OpenTURNS 中的任何分布，computeMinimumVolumeLevelSetWithThreshold 方法计算所需的级别集和相应的 PDF 值。

让我们看看它在实践中的表现。为了得到一个有趣的例子，我从两个高斯分布的混合中创建了一个二维分布。

import openturns as ot
# Create a gaussian
corr = ot.CorrelationMatrix(2)
corr[0, 1] = 0.2
copula = ot.NormalCopula(corr)
x1 = ot.Normal(-1., 1)
x2 = ot.Normal(2, 1)
x_funk = ot.ComposedDistribution([x1, x2], copula)

# Create a second gaussian
x1 = ot.Normal(1.,1)
x2 = ot.Normal(-2,1)
x_punk = ot.ComposedDistribution([x1, x2], copula)

# Mix the distributions
mixture = ot.Mixture([x_funk, x_punk], [0.5,1.])

# Generate the sample
sample = mixture.getSample(500)

这是您的问题开始的地方。从多维 Scott 规则创建二元核平滑只需要两条线。

factory = ot.KernelSmoothing()
distribution = factory.build(sample)

只需绘制这个估计分布的等高线就很简单了。

distribution.drawPDF()

产生：

这显示了分布的形状。然而，PDF 的轮廓并没有传达太多关于初始样本的信息。

计算最小音量水平集的反演需要一个初始样本，当维度大于 1 时，该样本由 Monte-Carlo 方法生成。默认样本大小（接近 16 000）是可以的，但我通常设置它独自一人，以确保我了解自己的工作。

ot.ResourceMap.SetAsUnsignedInteger(
    "Distribution-MinimumVolumeLevelSetSamplingSize", 1000
)
alpha = 0.9
levelSet, threshold = distribution.computeMinimumVolumeLevelSetWithThreshold(alpha)

threshold 变量包含问题的解决方案，即对应于最小音量设置的 PDF 值。

最后一步是绘制样本和相应的最小音量级别集。

def drawLevelSetContour2D(
    distribution, numberOfPointsInXAxis, alpha, threshold, sample
):
    """
    Compute the minimum volume LevelSet of measure equal to alpha and get the
    corresponding density value (named threshold).
    Draw a contour plot for the distribution, where the PDF is equal to threshold.
    """
    sampleSize = sample.getSize()
    X1min = sample[:, 0].getMin()[0]
    X1max = sample[:, 0].getMax()[0]
    X2min = sample[:, 1].getMin()[0]
    X2max = sample[:, 1].getMax()[0]
    xx = ot.Box([numberOfPointsInXAxis], ot.Interval([X1min], [X1max])).generate()
    yy = ot.Box([numberOfPointsInXAxis], ot.Interval([X2min], [X2max])).generate()
    xy = ot.Box(
        [numberOfPointsInXAxis, numberOfPointsInXAxis],
        ot.Interval([X1min, X2min], [X1max, X2max]),
    ).generate()
    data = distribution.computePDF(xy)
    graph = ot.Graph("", "X1", "X2", True, "topright")
    labels = ["%.2f%%" % (100 * alpha)]
    contour = ot.Contour(xx, yy, data, ot.Point([threshold]), ot.Description(labels))
    contour.setColor("black")
    graph.setTitle(
        "%.2f%% of the distribution, sample size = %d" % (100 * alpha, sampleSize)
    )
    graph.add(contour)
    cloud = ot.Cloud(sample)
    graph.add(cloud)
    return graph

我们最终在每个轴上用 50 个点绘制了关卡集的轮廓。

numberOfPointsInXAxis = 50
drawLevelSetContour2D(mixture, numberOfPointsInXAxis, alpha, threshold, sample)

下图绘制了样本以及域的轮廓，该域包含从内核平滑分布估计的 90% 的总体。该区域之外的任何点都可以被视为异常值，尽管我们可能为此使用更高的 alpha=0.95 值。

完整示例在Minimum volume level set 中有详细说明。在othdrplot 中将其应用于随机过程。此处使用的想法在 Rob J Hyndman 和 Han Lin Shang 中有详细说明。功能数据的彩虹图、袋图和箱线图。计算与图形统计杂志，2009 年 19:29-45。

【讨论】：

以上是关于图像的核密度估计的主要内容，如果未能解决你的问题，请参考以下文章