如何分析 cuda 内核的全局内存事务数量？

Posted 2023-03-06

技术标签:

【中文标题】如何分析 cuda 内核的全局内存事务数量？【英文标题】：How to profile the number of global memory transactions for cuda kernels? 【发布时间】：2012-03-22 16:58:08 【问题描述】：

如何在 cuda 命令行分析器中为“uncached_global_load_transaction”计数器启用分析？

【问题讨论】：

【参考方案1】：

使用以下环境变量控制命令行分析器 -

COMPUTE_PROFILE: is set to either 1 or 0 (or unset) to enable or disable profiling.
COMPUTE_PROFILE_CONFIG: is used to specify a config file for enabling performance counters in the GPU and various other options.
COMPUTE_PROFILE_LOG: is set to the desired file path for profiling output.

在您的情况下，您可以将上述环境变量设置为 -

COMPUTE_PROFILE=1
COMPUTE_PROFILE_CONFIG=config.txt
COMPUTE_PROFILE_LOG=profiler_output.txt

config.txt 必须包含一个条目uncached_global_load_transaction。

【讨论】：

谢谢。我做了同样的事情，但探查器无法识别 uncached_global_load_transaction 选项。可能我的卡不支持。

以上是关于如何分析 cuda 内核的全局内存事务数量？的主要内容，如果未能解决你的问题，请参考以下文章

为啥CUDA会四舍五入线程使用的寄存器数量？

如何使用 Python 和 Numba 获取 GPU 中的 CUDA 内核数量？

如何将多个重复的参数传递给 CUDA 内核

CUDA - 确定共享内存中的银行数量

是否可以对给定代码的 Cuda 编程中使用的内核数量设置限制？

CUDA编译中如何分配寄存器[重复]