通过平均或重新组合一个 numpy 2d 数组来调整大小

Posted

技术标签:

【中文标题】通过平均或重新组合一个 numpy 2d 数组来调整大小【英文标题】:resize with averaging or rebin a numpy 2d array 【发布时间】:2011-12-26 18:36:16 【问题描述】:

我正在尝试在 python 中重新实现一个 IDL 函数:

http://star.pst.qub.ac.uk/idl/REBIN.html

通过平均将二维数组缩小一个整数因子。

例如:

>>> a=np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

我想通过取相关样本的平均值将其调整为 (2,3),预期输出为:

>>> b = rebin(a, (2, 3))
>>> b
array([[  3.5,   5.5,  7.5],
       [ 15.5, 17.5,  19.5]])

b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4]) 等等。

我相信我应该重塑为 4 维数组,然后在正确的切片上取平均值,但无法弄清楚算法。你有什么提示吗?

【问题讨论】:

刚刚发现这是***.com/questions/4624112/…的副本,但是在使用***中的搜索功能之前我找不到它。 【参考方案1】:

这是一个基于the answer you've linked 的示例(为清楚起见):

>>> import numpy as np
>>> a = np.arange(24).reshape((4,6))
>>> a
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
>>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1)
array([[  3.5,   5.5,   7.5],
       [ 15.5,  17.5,  19.5]])

作为一个函数:

def rebin(a, shape):
    sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1]
    return a.reshape(sh).mean(-1).mean(1)

【讨论】:

谢谢,我已经在 github 上创建了 gist 来实现这个功能,以防其他人需要它:gist.github.com/1348792,我还建议在 numpy-discussion 上将其添加到 numpy 但答案是否定的。 他们给出否定答案的理由了吗? 我认为this 是讨论。似乎并不消极,只是时间不够或兴趣不够。 请记住,对具有 NaN 的数据进行平均将返回 NaN。因此,如果您想要一个忽略任何 NaN 值的平均值,您将需要 nanmean() 代替。仍然是一个很好的答案。【参考方案2】:

J.F. Sebastian 对 2D 分箱有一个很好的答案。这是他的“rebin”函数的一个版本,适用于 N 维:

def bin_ndarray(ndarray, new_shape, operation='sum'):
    """
    Bins an ndarray in all axes based on the target shape, by summing or
        averaging.

    Number of output dimensions must match number of input dimensions and 
        new axes must divide old ones.

    Example
    -------
    >>> m = np.arange(0,100,1).reshape((10,10))
    >>> n = bin_ndarray(m, new_shape=(5,5), operation='sum')
    >>> print(n)

    [[ 22  30  38  46  54]
     [102 110 118 126 134]
     [182 190 198 206 214]
     [262 270 278 286 294]
     [342 350 358 366 374]]

    """
    operation = operation.lower()
    if not operation in ['sum', 'mean']:
        raise ValueError("Operation not supported.")
    if ndarray.ndim != len(new_shape):
        raise ValueError("Shape mismatch:  -> ".format(ndarray.shape,
                                                           new_shape))
    compression_pairs = [(d, c//d) for d,c in zip(new_shape,
                                                  ndarray.shape)]
    flattened = [l for p in compression_pairs for l in p]
    ndarray = ndarray.reshape(flattened)
    for i in range(len(new_shape)):
        op = getattr(ndarray, operation)
        ndarray = op(-1*(i+1))
    return ndarray

【讨论】:

【参考方案3】:

这是一种使用矩阵乘法完成您所要求的操作的方法,不需要新的数组维度来除旧。

首先我们生成一个行压缩矩阵和一个列压缩矩阵(我确信有一种更简洁的方法可以做到这一点,甚至可以单独使用 numpy 操作):

def get_row_compressor(old_dimension, new_dimension):
    dim_compressor = np.zeros((new_dimension, old_dimension))
    bin_size = float(old_dimension) / new_dimension
    next_bin_break = bin_size
    which_row = 0
    which_column = 0
    while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]:
        if round(next_bin_break - which_column, 10) >= 1:
            dim_compressor[which_row, which_column] = 1
            which_column += 1
        elif next_bin_break == which_column:

            which_row += 1
            next_bin_break += bin_size
        else:
            partial_credit = next_bin_break - which_column
            dim_compressor[which_row, which_column] = partial_credit
            which_row += 1
            dim_compressor[which_row, which_column] = 1 - partial_credit
            which_column += 1
            next_bin_break += bin_size
    dim_compressor /= bin_size
    return dim_compressor


def get_column_compressor(old_dimension, new_dimension):
    return get_row_compressor(old_dimension, new_dimension).transpose()

...例如,get_row_compressor(5, 3) 给你:

[[ 0.6  0.4  0.   0.   0. ]
 [ 0.   0.2  0.6  0.2  0. ]
 [ 0.   0.   0.   0.4  0.6]]

get_column_compressor(3, 2) 给你:

[[ 0.66666667  0.        ]
 [ 0.33333333  0.33333333]
 [ 0.          0.66666667]]

然后简单的预乘行压缩器,后乘列压缩器,得到压缩矩阵:

def compress_and_average(array, new_shape):
    # Note: new shape should be smaller in both dimensions than old shape
    return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * \
           np.mat(array) * \
           np.mat(get_column_compressor(array.shape[1], new_shape[1]))

使用这种技术,

compress_and_average(np.array([[50, 7, 2, 0, 1],
                               [0, 0, 2, 8, 4],
                               [4, 1, 1, 0, 0]]), (2, 3))

产量:

[[ 21.86666667   2.66666667   2.26666667]
 [  1.86666667   1.46666667   1.86666667]]

【讨论】:

这太棒了,即使新形状不是原始形状的倍数(我在其他解决方案中遇到的问题),它也能工作。【参考方案4】:

我试图缩小栅格 - 采用大约 6000 x 2000 大小的栅格并将其转换为任意大小的较小栅格,该栅格在以前的 bin 大小中正确平均值。我找到了一个使用 SciPy 的解决方案,但是我无法让 SciPy 安装在我正在使用的共享托管服务上,所以我只写了这个函数。可能有更好的方法来做到这一点,它不涉及循环遍历行和列,但这似乎确实有效。

这样做的好处是旧的行数和列数不必被新的行数和列数整除。

def resize_array(a, new_rows, new_cols): 
    '''
    This function takes an 2D numpy array a and produces a smaller array 
    of size new_rows, new_cols. new_rows and new_cols must be less than 
    or equal to the number of rows and columns in a.
    '''
    rows = len(a)
    cols = len(a[0])
    yscale = float(rows) / new_rows 
    xscale = float(cols) / new_cols

    # first average across the cols to shorten rows    
    new_a = np.zeros((rows, new_cols)) 
    for j in range(new_cols):
        # get the indices of the original array we are going to average across
        the_x_range = (j*xscale, (j+1)*xscale)
        firstx = int(the_x_range[0])
        lastx = int(the_x_range[1])
        # figure out the portion of the first and last index that overlap
        # with the new index, and thus the portion of those cells that 
        # we need to include in our average
        x0_scale = 1 - (the_x_range[0]-int(the_x_range[0]))
        xEnd_scale =  (the_x_range[1]-int(the_x_range[1]))
        # scale_line is a 1d array that corresponds to the portion of each old
        # index in the_x_range that should be included in the new average
        scale_line = np.ones((lastx-firstx+1))
        scale_line[0] = x0_scale
        scale_line[-1] = xEnd_scale
        # Make sure you don't screw up and include an index that is too large
        # for the array. This isn't great, as there could be some floating
        # point errors that mess up this comparison.
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lastx = lastx - 1
        # Now it's linear algebra time. Take the dot product of a slice of
        # the original array and the scale_line
        new_a[:,j] = np.dot(a[:,firstx:lastx+1], scale_line)/scale_line.sum()

    # Then average across the rows to shorten the cols. Same method as above.
    # It is probably possible to simplify this code, as this is more or less
    # the same procedure as the block of code above, but transposed.
    # Here I'm reusing the variable a. Sorry if that's confusing.
    a = np.zeros((new_rows, new_cols))
    for i in range(new_rows):
        the_y_range = (i*yscale, (i+1)*yscale)
        firsty = int(the_y_range[0])
        lasty = int(the_y_range[1])
        y0_scale = 1 - (the_y_range[0]-int(the_y_range[0]))
        yEnd_scale =  (the_y_range[1]-int(the_y_range[1]))
        scale_line = np.ones((lasty-firsty+1))
        scale_line[0] = y0_scale
        scale_line[-1] = yEnd_scale
        if scale_line[-1] == 0:
            scale_line = scale_line[:-1]
            lasty = lasty - 1
        a[i:,] = np.dot(scale_line, new_a[firsty:lasty+1,])/scale_line.sum() 

    return a 

【讨论】:

并不总是有效,例如:resize_array(np.random.uniform(size=(12961, 1)), 50, 1)(给出错误)

以上是关于通过平均或重新组合一个 numpy 2d 数组来调整大小的主要内容,如果未能解决你的问题,请参考以下文章

基于 2D 数组的 3D numpy 切片的平均值

如何将稀疏的 pandas 数据帧转换为 2d numpy 数组

给定两个 2D numpy 数组 A 和 B,如何有效地将采用两个 1D 数组的函数应用于 A 和 B 行的每个组合?

保持 Numpy 数组 2D

如何将两个数组合并为一个相应的2d数组?

Numpy实现MaxPooling2D(最大池化)和AveragePooling2D(平均池化)