如何对方程进行蒙特卡罗分析？

Posted 2023-02-16

技术标签:

【中文标题】如何对方程进行蒙特卡罗分析？【英文标题】：How can I do a Monte Carlo analysis on an equation? 【发布时间】：2017-09-18 23:38:24 【问题描述】：

给定一个依赖于多个变量的函数，每个变量都有一定的概率分布，我如何进行蒙特卡罗分析以获得函数的概率分布。理想情况下，随着参数数量或迭代次数的增加，我希望该解决方案具有高性能。

例如，我为total_time 提供了一个等式，它取决于许多其他参数。

import numpy as np
import matplotlib.pyplot as plt

size = 1000

gym = [30, 30, 35, 35, 35, 35, 35, 35, 40, 40, 40, 45, 45]

left = 5
right = 10
mode = 9
shower = np.random.triangular(left, mode, right, size)

argument = np.random.choice([0, 45], size, p=[0.9, 0.1])

mu = 15
sigma = 5 / 3
dinner = np.random.normal(mu, sigma, size)

mu = 45
sigma = 15/3
work = np.random.normal(mu, sigma, size)

brush_my_teeth = 2

variables = gym, shower, dinner, argument, work, brush_my_teeth
for variable in variables:
    plt.figure()
    plt.hist(variable)
plt.show()


def total_time(variables):
    return np.sum(variables)

健身房

淋浴

晚餐

论据

工作

brush_my_teeth

【问题讨论】：

你试过pymc包吗？ 【参考方案1】：

您尝试过简单的for 循环吗？首先，定义常量和函数。然后，运行循环 n 次（示例中为 10'000），为变量绘制新的随机值并每次计算函数结果。最后，将所有结果追加到results_dist，然后绘制它。

import numpy as np
import matplotlib.pyplot as plt

gym = [30, 30, 35, 35, 35, 35, 35, 35, 40, 40, 40, 45, 45]
brush_my_teeth = 2
size = 1000

def total_time(variables):
    return np.sum(variables)

results_dist = []
for i in range(10000):
    shower = np.random.triangular(left=5, mode=9, right=10, size)
    argument = np.random.choice([0, 45], size, p=[0.9, 0.1])
    dinner = np.random.normal(mu=15, sigma=5/3, size)
    work = np.random.normal(mu=45, sigma=15/3, size)

    variables = gym, shower, dinner, argument, work, brush_my_teeth

    results_dist.append(total_time(variables))

plt.figure()
plt.hist(results_dist)
plt.show()

【讨论】：

这似乎几乎是正确的，但是，我认为发帖者打算将gym 列表表示一个概率分布，而您似乎还没有正确地从该分布中抽样。要正确采样gym 变量，您需要在for 循环中添加一个额外的行，它会在np.arange(len(gym)) 的间隔上生成一个随机整数。然后将该数字用作gym 列表中的随机索引，以便从分布中抽样。【参考方案2】：

现有答案的想法是正确的，但我怀疑您是否想像 nicogen 那样对 size 中的所有值求和。

我假设您选择了一个相对较大的 size 来演示直方图中的形状，而您希望从每个类别中总结一个值。例如，我们想要计算每个活动的一个实例的总和，而不是 1000 个实例。

第一个代码块假定您知道您的函数是一个求和，因此可以使用快速 numpy 求和来计算总和。

import numpy as np
import matplotlib.pyplot as plt

mc_trials = 10000

gym = np.random.choice([30, 30, 35, 35, 35, 35, 
                    35, 35, 40, 40, 40, 45, 45], mc_trials)
brush_my_teeth = np.random.choice([2], mc_trials)
argument = np.random.choice([0, 45], size=mc_trials, p=[0.9, 0.1])
dinner = np.random.normal(15, 5/3, size=mc_trials)
work = np.random.normal(45, 15/3, size=mc_trials)
shower = np.random.triangular(left=5, mode=9, right=10, size=mc_trials)

col_per_trial = np.vstack([gym, brush_my_teeth, argument,
           dinner, work, shower])

mc_function_trials = np.sum(col_per_trial,axis=0)

plt.figure()
plt.hist(mc_function_trials,30)
plt.xlim([0,200])
plt.show()

如果你不知道你的函数，或者不能轻易地重铸为一个 numpy element-wise 矩阵运算，你仍然可以像这样循环：

def total_time(variables):
        return np.sum(variables)

mc_function_trials = [total_time(col) for col in col_per_trial.T]

您询问有关获得“概率分布”的问题。像我们上面所做的那样获取直方图并不能完全为您做到这一点。它为您提供视觉表示，但不是分布函数。为了得到这个函数，我们需要使用核密度估计。 scikit-learn 有一个罐头 function and example 可以做到这一点。

from sklearn.neighbors import KernelDensity
mc_function_trials = np.array(mc_function_trials)
kde = (KernelDensity(kernel='gaussian', bandwidth=2)
       .fit(mc_function_trials[:, np.newaxis]))

density_function = lambda x: np.exp(kde.score_samples(x))

time_values = np.arange(200)[:, np.newaxis]
plt.plot(time_values, density_function(time_values))

现在您可以计算总和小于 100 的概率，例如：

import scipy.integrate as integrate
probability, accuracy = integrate.quad(density_function, 0, 100)
print(probability)
# prints 0.15809

【讨论】：

【参考方案3】：

对于这类事情，我建议查看Halton sequences 和类似的准随机低差异序列。 ghalton 包可以很容易地生成确定性但低差异的序列：

import ghalton as gh
sequence = gh.Halton(n)  # n is the number of dimensions you want

然后在其他一些答案的基础上，您可以执行以下操作：

values = sequence.get(10000)  # generate a bunch of draws of
for vals in values:
    # vals will have a single sample of n quasi-random numbers
    variables = # add whatever other stuff you need to your quasi-random values
    results_dist.append(total_time(variables))

如果您查看一些关于准随机序列的研究论文，就会发现它们在蒙特卡洛积分和采样等应用的收敛方面做得更好。基本上，您可以更均匀地覆盖搜索空间，同时在样本中保持类似随机的属性，这在大多数情况下会导致更快的收敛。

这基本上可以让您在n 维度上获得均匀分布。如果您想在某些维度上有非均匀分布，您可以相应地转换您的均匀分布。我不确定这会对 Halton 序列的低差异属性产生什么影响，但可能值得研究。

【讨论】：

以上是关于如何对方程进行蒙特卡罗分析？的主要内容，如果未能解决你的问题，请参考以下文章