如何在 Python 中为截断的正态分布生成相关随机数?

Posted

技术标签:

【中文标题】如何在 Python 中为截断的正态分布生成相关随机数?【英文标题】:How to generate correlated random numbers for truncated normal distribution in Python? 【发布时间】:2016-02-09 00:21:24 【问题描述】:

我正在尝试使用numpy.random.multivariate_normal() 以均值和协方差矩阵(根据数据计算)作为输入为三个变量生成相关随机数。

正态分布在 0 和 1 之间被截断,因此,生成的随机数(对于所有三个变量)应该在 0 和 1 之间。但是,一些生成的随机数超出了界限。

如何控制为每个变量生成正态分布随机数的界限?

编辑:我可以使用scipy.stats.truncnorm 独立地从三个截断的正态分布中生成不相关的随机数。但是,在这里我正在寻找可以生成相关随机数的东西。

【问题讨论】:

我已经删除了我的答案;我猜你想要这样的东西en.wikipedia.org/wiki/Truncated_normal_distribution - 可能值得在问题中这么说。 是的,我可以使用'scipy.stats.truncnorm'从两个截断的正态分布中独立生成不相关的随机数。但是,在这里我正在寻找可以生成相关随机数的东西。 您能提供代码和相关矩阵吗? 【参考方案1】:

我迟到了六年,所以我不知道你还需要多少答案。但是前段时间我也需要回答这个问题,所以我为它创建了一个自定义函数。我想这是一个留给将来参考的好地方:

def BoundedMultivariateNormalDist(means, cov_matrix, dimenions_bounds=None, size=1, rng=None):
    """Custom function: Draw random samples from a multivariate (truly multi-dimentional) normal (Gaussian) distribution, optionally set lower and upper bounds for the both dimentions of the distribution.
    
    Iteratively draws the needed number of samples and discards the samples outside the bounds until the requested sample size is reached.
    
        Parameters
    ----------
    means : ndarray of ints or floats
        means of the n distriburions
    cov_matrix  : 2d array (n by n) of ints or floats
        the covariance matrix of the n distributions
    dimenions_bounds: 2d (n by 2) array of ints or floats, optional
        rows are the dimensions, columns are the lower and upper bounds (in that order). Default is None (i.e unbounded). 
    size : (positive) int, optional
        nummber of samples to draw and return from the distribution. Default is 1. 


    Returns
    -------
    out : ndarray
        Array of samples from the multivariate normal distribution. If size is 1 (or not specified) a single array (of size n) is returned.
    
    Author: Andre3582 
    Created on: 13-06-2020
    Last revised on: 17-08-2021"""
    
    # convert arr_means and cov matrix to np.array
    means = np.array(means)
    cov_matrix = np.array(cov_matrix)
    # check if dimentions agree
    if not means.shape[0] == cov_matrix.shape[0]:
        raise ValueError("dimentions of means and cov matrix do not agree")
    if not cov_matrix.shape[0] == cov_matrix.shape[1]:
        raise ValueError("dimentions of means and cov matrix do not agree")

    ndims = means.shape[0]

    # if no dimenions_bounds if provided make a dimenions_bounds with np.nans
    if dimenions_bounds is None:
        dimenions_bounds = np.tile((np.nan),(ndims,2)) # make a ndims x 2 array of np.nan values
    
    
    # dimenions_bounds should be a (ndims x 2) 2d array where each row represents a dimention, 
    # and, where the first column (index=0) holds the lower bound 
    #     where the second colums (index=1) holds the upper bound
    if not dimenions_bounds.shape == (ndims,2):
        raise ValueError("dimentions of dimenions_bounds doesnt match the dimention of means")
    
    # define a local size
    local_size = size

    # create an empty array
    return_samples = np.empty([0,ndims])

    # generate new samples while the needed size is not reached
    while not return_samples.shape[0] == size:

        # get 'size' number of samples
        samples = rng.multivariate_normal(means, cov_matrix,size=local_size)

        # samples is n array of length n (as many as means, and as many as the side of the cov matrix)
        # we will stack the arrays of sample on top of each other,
        # so each row of retrun_samples is a set of n samples (each sample from one of each dimention)
        # each colums is the set of samples from one of the n dimentions

        # select only the samples that are within the upper and lower bounds for both dimentions

        # for the fist of the nd value (index = 0)

        # for each 'column' we check if the values are within the bounds of that respective column

        for dim, bounds in enumerate(dimenions_bounds):

            # keep only the samples that are bigger than the lower bound
            if not np.isnan(bounds[0]): # bounds[0] is the lower bound
                samples = samples[(samples[:,dim] > bounds[0])]  # samples[:,dim] is the column of the dim

            # keep only the samples that are smaller than the upper bound
            if not np.isnan(bounds[1]): # bounds[1] is the upper bound
                samples = samples[(samples[:,dim] < bounds[1])]   # samples[:,dim] is the column of the dim


        # input the samples into the retun samples
        return_samples = np.vstack([return_samples, samples])

        # get new size which is the difference between the requested size and the size so far.
        local_size = size - return_samples.shape[0]
    
    # return a single value when the requested size is 1 (or not specified)
    if return_samples.shape[0] == 1:
        return return_samples[0]
    # otherwise 
    else:
        return return_samples

【讨论】:

以上是关于如何在 Python 中为截断的正态分布生成相关随机数?的主要内容,如果未能解决你的问题,请参考以下文章

如何在 Python 中为 QTableWidget 设置大小提示

如何在 python 中为 pandas 创建一个“非”过滤器

在 Python 中为 dbus 接口自动生成接口实现?

如何在python中为apriori算法生成k-itemset

[在计算序列的正态分布时如何在R中为dnorm设置步长?

在Python中为概率密度函数生成随机数