itertools.product的Numpy等价物[重复]

Posted 2023-02-25

技术标签:

【中文标题】itertools.product的Numpy等价物[重复]【英文标题】：Numpy equivalent of itertools.product [duplicate] 【发布时间】：2015-02-23 22:12:16 【问题描述】：

我知道itertools.product 用于迭代多个关键字维度的列表。例如，如果我有这个：

categories = [
    [ 'A', 'B', 'C', 'D'],
    [ 'E', 'F', 'G', 'H'],
    [ 'I', 'J', 'K', 'L']
]

我在上面使用itertools.product()，我有类似的东西：

>>> [ x for x in itertools.product(*categories) ]
('A', 'E', 'I'),
('A', 'E', 'J'),
('A', 'E', 'K'),
('A', 'E', 'L'),
('A', 'F', 'I'),
('A', 'F', 'J'),
# and so on...

对于numpy 的数组，是否有等效、直接的方法来做同样的事情？

【问题讨论】：

【参考方案1】：

这个问题已经被问过几次了：

Using numpy to build an array of all combinations of two arrays

itertools product speed up

第一个链接有一个有效的 numpy 解决方案，据称它比 itertools 快几倍，尽管没有提供基准。此代码由名为 pv 的用户编写。如果您觉得有用，请点击链接并支持他的回答：

import numpy as np

def cartesian(arrays, out=None):
    """
    Generate a cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the cartesian product of.
    out : ndarray
        Array to place the cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = n / arrays[0].size
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m,1:])
        for j in xrange(1, arrays[0].size):
            out[j*m:(j+1)*m,1:] = out[0:m,1:]
    return out

尽管如此，Alex Martelli 在同一篇文章中——他是 SO 的一位伟大的 Python 大师——写道，itertools 是完成这项任务的最快方法。所以这里有一个快速基准，可以证明 Alex 的话。

import numpy as np
import time
import itertools


def cartesian(arrays, out=None):
    ...


def test_numpy(arrays):
    for res in cartesian(arrays):
        pass


def test_itertools(arrays):
    for res in itertools.product(*arrays):
        pass


def main():
    arrays = [np.fromiter(range(100), dtype=int), np.fromiter(range(100, 200), dtype=int)]
    start = time.clock()
    for _ in range(100):
        test_numpy(arrays)
    print(time.clock() - start)
    start = time.clock()
    for _ in range(100):
        test_itertools(arrays)
    print(time.clock() - start)

if __name__ == '__main__':
    main()

输出：

0.421036
0.06742

因此，您绝对应该使用 itertools。

【讨论】：

感谢您的扩展答案和随之而来的建议速度差异是因为您正在迭代笛卡尔（）结果，并且对 numpy 数组的迭代比对 Python 迭代器的迭代要慢。如果只想构造数组，则需要将cartesian(...) 与np.array(list(itertools.product(...))) 进行比较。然而，对于迭代，itertools 是正确的答案，但这里的问题是关于构造的问题。 @Jivan 作为 pv。已经指出，由于将 Python 迭代器（由itertools.product 生成）转换为 numpy 数组的显着开销，他的 numpy 函数将更快地构建 numpy 数组，因为无法创建对象的 numpy 数组（在这种情况下为元组）直接来自迭代器。在我的测试中，它快了约 5 倍，但您应该记住，迭代 numpy 数组要慢得多（根据我上面发布的测试，它慢了 5 倍以上），因此如果您主要关心的是速度，则应该使用迭代器。答案是错误的，可能会产生误导。 如果您从 numpy 数组开始并以 numpy 数组结束，使用 numpy.meshgrid 会快得多。那时它真的变成了 RAM io。但是对于我来说，在没有太多优化的情况下使用 meshgrid 和 ravel concat 就像数量级一样。 np.array(list(itertoolsproduct(x, y))) 不快。

以上是关于itertools.product的Numpy等价物[重复]的主要内容，如果未能解决你的问题，请参考以下文章