PyOpenCL 内核未应用于整个阵列

Posted 2023-02-16

技术标签:

【中文标题】PyOpenCL 内核未应用于整个阵列【英文标题】：PyOpenCL kernel not being applied to entire array 【发布时间】：2018-11-19 00:08:22 【问题描述】：

我想感受一下 PyOpenCL 附带的 Elementwise 演示，并决定尝试一下：

from __future__ import absolute_import
from __future__ import print_function
import pyopencl as cl
import pyopencl.array as cl_array
import numpy
from pyopencl.elementwise import ElementwiseKernel

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

n = 6

a_gpu = cl.array.to_device(queue,
numpy.arange(1, n, dtype=int))

update_a = ElementwiseKernel(ctx,
"int *a",
"a[i] = 2*a[i]",
"update_a")

print(a_gpu.get())
update_a(a_gpu)
print(a_gpu.get())

我希望打印出来的

[1 2 3 4 5]
[2 4 6 8 10]

但我却得到了

[1 2 3 4 5]
[2 4 6 4 5] .

此外，当我尝试将“i”值存储到数组中以查看发生了什么时，我得到了一些非常奇怪的值。它们到处都是，有些甚至是负面的。

一段时间以来，我一直试图理解这一点，但无法理解。有人可以解释为什么会这样吗？谢谢。

相关信息：PyOpenCL 版本：2018.2.1，Python 版本：3.6.5，操作系统：macOS 10.14.1

【问题讨论】：

【参考方案1】：

您的错误在于 numpy 数组的类型模糊，这导致在 CPU 和 CL 设备端沿数组元素的步幅不一致

指定 dtype=int 是不明确的，并假定 8 字节 np.int64 或 long 元素。 CL 设备端的匹配类型应为long *a_in 对应np.int64。

如果您想坚持使用 4 字节整数，请在 CPU 端指定 dtype=np.int32，在 CL 设备端指定 int *a_in。

要点：始终明确指定您的 numpy 数组类型，例如，dtype=np.int64。并检查 CL 设备端的精确匹配。

【讨论】：

以上是关于PyOpenCL 内核未应用于整个阵列的主要内容，如果未能解决你的问题，请参考以下文章

在 pyOpencl 中传递向量数组

pyopencl 在数组中返回错误的 float3 值

对 nvidia GPU 上的计算单元和预期内核的混淆

《Linux内核设计与实现》读书笔记- 内核数据结构

性能：内核模块符号未显示在分析中

debian下，升级包安装时依赖于内核版本吗?