设备内存上的推力减小结果

Posted 2023-02-19

技术标签:

【中文标题】设备内存上的推力减小结果【英文标题】：thrust reduction result on device memory 【发布时间】：2014-03-12 18:06:49 【问题描述】：

是否可以将推力::reduce 操作的返回值留在设备分配的内存中？如果是这样，是否就像将值分配给 cudaMalloc'ed 区域一样简单，还是应该使用推力::device_ptr？

【问题讨论】：

【参考方案1】：

是否可以将thrust::reduce 操作的返回值留在设备分配的内存中？

简短的回答是否定的。

thrust reduce 返回一个数量，即减少的结果。这个quantity must be deposited in a host resident variable：

以reduce为例，它是同步的并且总是将其结果返回给 CPU：

template<typename Iterator, typename T> 
T reduce(Iterator first, Iterator last, T init);

一旦运算结果返回到 CPU，你可以根据需要将其复制到 GPU：

#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/reduce.h>

int main()

    thrust::device_vector<int> data(256, 1);
    thrust::device_vector<int> result(1);
    result[0] = thrust::reduce(data.begin(), data.end());
    std::cout << "result = " << result[0] << std::endl;
    return 0;

另一种可能的替代方法是使用thrust::reduce_by_key，它将减少结果返回到设备内存，而不是复制到主机内存。如果您对整个数组使用单个键，则最终结果将是单个输出，类似于thrust::reduce

【讨论】：

好答案，如果你真的想使用reduction并将结果返回到设备内存，你可以使用cuda npp库或自己构建reduction，在cuda example中查找reduction。跨度> reduce_by_key 和一个 constant_iterator 是解决这个问题的好方法，谢谢【参考方案2】：

是的，应该可以使用推力::reduce_by_key 代替为键提供的推力::constant_iterator。

【讨论】：

我知道这是旧的，但感谢您的建议，这是解决问题的好方法