如何生成整数的随机正态分布

Posted 2023-02-23

技术标签:

【中文标题】如何生成整数的随机正态分布【英文标题】：How to generate a random normal distribution of integers 【发布时间】：2016-09-21 13:15:36 【问题描述】：

如何像np.random.randint() 一样生成一个随机整数，但其正态分布在 0 左右。

np.random.randint(-10, 10) 返回离散均匀分布的整数 np.random.normal(0, 0.1, 1) 返回正态分布的浮点数

我想要的是两种功能之间的一种结合。

【问题讨论】：

正态分布在定义上是连续的，所以这个问题的答案取决于你想如何离散它。一种可能的解决方案是从np.random.normal 中采样并将结果四舍五入为整数。 【参考方案1】：

获得看起来像正态分布的离散分布的另一种方法是从多项分布中提取，其中概率是根据正态分布计算的。

import scipy.stats as ss
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-10, 11)
xU, xL = x + 0.5, x - 0.5 
prob = ss.norm.cdf(xU, scale = 3) - ss.norm.cdf(xL, scale = 3)
prob = prob / prob.sum() # normalize the probabilities so their sum is 1
nums = np.random.choice(x, size = 10000, p = prob)
plt.hist(nums, bins = len(x))

这里，np.random.choice 从 [-10, 10] 中选择一个整数。选择一个元素的概率，比如 0，由 p(-0.5

结果如下：

【讨论】：

你为什么在xL和xU中加减0.5？对于连续分布，P(x=0)=0（对于任何其他数字都是如此）。概率是针对区间定义的。在这里，为了将概率与 0（和其他整数）相关联，我使用了区间 (-0.5, 0.5) 这基本上是因为问题要求整数。对于 1，它是 (0.5, 1.5)。你为什么不接受ss.norm.pdf(x, scale=3)？例如，这些参数可以正常工作，但如果标准偏差较小，您最终会导致 pdf 大于 1。只要你之后进行相同的归一化（除以总和）就可以了，但我不想混淆潜在的读者，因为 pdf 实际上不是概率（这就是它可以大于 1 的原因）所以我想使用实际概率。【参考方案2】：

可能会从Truncated Normal Distribution 生成类似的分布，四舍五入为整数。这是 scipy 的 truncnorm() 的示例。

import numpy as np
from scipy.stats import truncnorm
import matplotlib.pyplot as plt

scale = 3.
range = 10
size = 100000

X = truncnorm(a=-range/scale, b=+range/scale, scale=scale).rvs(size=size)
X = X.round().astype(int)

让我们看看它是什么样子的

bins = 2 * range + 1
plt.hist(X, bins)

【讨论】：

感谢@ayhan 和 bakkal 的回复。拜托，我问这个只是为了我的知识；我不想侮辱任何一个答案。单看剧情，Bakkal 的比较对称。它们看起来都足够了，而且从代码来看似乎同样有效。但是我的理解很薄弱。有没有客观上更好的方法？ @RobertLugg 相对较高的对称性可能是由于样本量较大。也就是说，我认为这个答案中的代码更简单。请注意，您使用此代码覆盖了 python 范围函数。尝试使用中性变量名。虽然代码更简单，但也更慢。在我的测试中，比 ayhan 的解决方案慢了大约 100 倍。尽管如此，生成 10.000 个数字 50 次需要 3 秒。所以在很多情况下这都可以。【参考方案3】：

此处接受的答案有效，但我尝试了 Will Vousden 的解决方案，效果也很好：

import numpy as np

# Generate Distribution:
randomNums = np.random.normal(scale=3, size=100000)
randomInts = np.round(randomNums)

# Plot:
axis = np.arange(start=min(randomInts), stop = max(randomInts) + 1)
plt.hist(randomInts, bins = axis)

【讨论】：

不是生成randomNums并将它们四舍五入为“整数”（实际上，实数以.0结尾），那么randomInts = np.random.normal(loc=10, scale=3, size=10000).astype(int)-10呢，它返回实际的整数？注意，但是，必须使用 loc 而不是 0 生成值（并通过减去 loc 将其返回到 0），否则您将得到太多结果正好在0。【参考方案4】：

这里我们首先从bell curve 获取值。

代码：

#--------*---------*---------*---------*---------*---------*---------*---------*
# Desc: Discretize a normal distribution centered at 0
#--------*---------*---------*---------*---------*---------*---------*---------*

import sys
import random
from math import sqrt, pi
import numpy as np
import matplotlib.pyplot as plt

def gaussian(x, var):
    k1 = np.power(x, 2)
    k2 = -k1/(2*var)
    return (1./(sqrt(2. * pi * var))) * np.exp(k2)

#--------*---------*---------*---------*---------*---------*---------*---------#
while 1:#                          M A I N L I N E                             #
#--------*---------*---------*---------*---------*---------*---------*---------#
#                                  # probability density function
#                                  #   for discrete normal RV
    pdf_DGV = []
    pdf_DGW = []    
    var = 9
    tot = 0    
#                                  # create 'rough' gaussian
    for i in range(-var - 1, var + 2):
        if i ==  -var - 1:
            r_pdf = + gaussian(i, 9) + gaussian(i - 1, 9) + gaussian(i - 2, 9)
        elif i == var + 1:
            r_pdf = + gaussian(i, 9) + gaussian(i + 1, 9) + gaussian(i + 2, 9)
        else:
            r_pdf = gaussian(i, 9)
        tot = tot + r_pdf
        pdf_DGV.append(i)
        pdf_DGW.append(r_pdf)
        print(i, r_pdf)
#                                  # amusing how close tot is to 1!
    print('\nRough total = ', tot)
#                                  # no need to normalize with Python 3.6,
#                                  #   but can't help ourselves
    for i in range(0,len(pdf_DGW)):
        pdf_DGW[i] = pdf_DGW[i]/tot
#                                  # print out pdf weights
#                                  #   for out discrte gaussian
    print('\npdf:\n')
    print(pdf_DGW)

#                                  # plot random variable action
    rv_samples = random.choices(pdf_DGV, pdf_DGW, k=10000)
    plt.hist(rv_samples, bins = 100)
    plt.show()
    sys.exit()

输出：

-10 0.0007187932912256041
-9 0.001477282803979336
-8 0.003798662007932481
-7 0.008740629697903166
-6 0.017996988837729353
-5 0.03315904626424957
-4 0.05467002489199788
-3 0.0806569081730478
-2 0.10648266850745075
-1 0.12579440923099774
0 0.1329807601338109
1 0.12579440923099774
2 0.10648266850745075
3 0.0806569081730478
4 0.05467002489199788
5 0.03315904626424957
6 0.017996988837729353
7 0.008740629697903166
8 0.003798662007932481
9 0.001477282803979336
10 0.0007187932912256041

Rough total =  0.9999715875468381

pdf:

[0.000718813714486599, 0.0014773247784004072, 0.003798769940305483, 0.008740878047691289, 0.017997500190860556, 0.033159988420867426, 0.05467157824565407, 0.08065919989878699, 0.10648569402724471, 0.12579798346031068, 0.13298453855078374, 0.12579798346031068, 0.10648569402724471, 0.08065919989878699, 0.05467157824565407, 0.033159988420867426, 0.017997500190860556, 0.008740878047691289, 0.003798769940305483, 0.0014773247784004072, 0.000718813714486599]

【讨论】：

【参考方案5】：

这个版本在数学上是不正确的（因为你剪掉了铃铛），但如果不需要那么精确，它会快速且容易理解地完成这项工作：

def draw_random_normal_int(low:int, high:int):

    # generate a random normal number (float)
    normal = np.random.normal(loc=0, scale=1, size=1)

    # clip to -3, 3 (where the bell with mean 0 and std 1 is very close to zero
    normal = -3 if normal < -3 else normal
    normal = 3 if normal > 3 else normal

    # scale range of 6 (-3..3) to range of low-high
    scaling_factor = (high-low) / 6
    normal_scaled = normal * scaling_factor

    # center around mean of range of low high
    normal_scaled += low + (high-low)/2

    # then round and return
    return np.round(normal_scaled)

绘制 100000 个数字会得到以下直方图：

【讨论】：

以上是关于如何生成整数的随机正态分布的主要内容，如果未能解决你的问题，请参考以下文章