如何在一种方法中调用一个函数两次来编译 cuda 代码？

Posted 2023-02-16

技术标签:

【中文标题】如何在一种方法中调用一个函数两次来编译 cuda 代码？【英文标题】：How to compile cuda code with calling one function twice inside one method? 【发布时间】：2021-02-28 00:01:03 【问题描述】：

我正在尝试编译这段代码：

struct foo 
    unsigned long long x0;
;

//__device__ __noinline__ foo bar(foo a, foo b)  // << try this
__device__ foo bar(foo a, foo b)
    foo r;
    asm(
    ".reg .u64 my_cool_var;\n\t"
    "add.cc.u64 %0, %1, %2;\n\t"
    : "=l"(r.x0)
    : "l"(a.x0)
      "l"(b.x0)
    );
    return r;


__device__ foo func_with2call(foo x, foo y)
    foo res = bar(x, y);
    foo iy =  bar(x, res);
    return iy;


__global__ void cuda_test1(foo *a, foo *b, foo *r) 
    *r = func_with2call(*a, *b);

编译器输出：

ptxas /tmp/tmpxft_000010f5_00000000-6_main.ptx, line 38; error   : Duplicate definition of variable 'my_cool_var'
ptxas fatal   : Ptx assembly aborted due to errors

我知道，这是代码内联的问题。例如，如果我用__noinline__ 属性编译bar 函数，那么就没有错误。有没有办法保持内联（除了用不同的内部变量名复制bar函数代码），但仍然调用bar函数两次？

【问题讨论】：

【参考方案1】：

Inline PTX Assembly Guide 中讨论了此限制。您可以通过将每个定义强制到自己的范围内来解决它，例如：

__device__ foo bar(foo a, foo b)
    foo r;
    asm(
    ".reg .u64 my_cool_var;\n\t"
    "add.cc.u64 %0, %1, %2;\n\t"
    ""
    : "=l"(r.x0)
    : "l"(a.x0)
      "l"(b.x0)
    );
    return r;

这将安全地内联而不会发生冲突。

【讨论】：

以上是关于如何在一种方法中调用一个函数两次来编译 cuda 代码？的主要内容，如果未能解决你的问题，请参考以下文章