CUDA内核包装器的共享库未定义引用

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CUDA内核包装器的共享库未定义引用相关的知识,希望对你有一定的参考价值。

所以我试图在Windows上使用带有Go的cgo的CUDA Runtime API。我已经在这里呆了几天而且卡住了:我得到了一个未定义的引用我的内核包装器。

我把我的内核和它的包装分离出来了

FILE:cGo.cuh

typedef unsigned long int ktype;
typedef unsigned char glob;

/*
function Prototypes
*/

extern "C" void kernel_kValid(int, int, ktype *, glob *);

__global__ void kValid(ktype *, glob *);

文件:cGo.cu

#include "cGo.cuh"
#include "device_launch_parameters.h"
#include "cuda.h"
#include "cuda_runtime.h"

//function Definitions

/*
kernel_kValid is a wrapper function for the CUDA Kernel to be called from Go
*/
extern "C" void kernel_kValid(int blocks, int threads, ktype *kInfo, glob *values) {
    kValid<<<blocks, threads>>>(kInfo, values);//execute the kernel
}


/*
kValid is the CUDA Kernel which is to be executed
*/
__global__ void kValid(ktype *kInfo, glob *values) {
    //lots of code
}

我将我的CUDA源代码编译成共享库,如下所示:

nvcc -shared -o myLib.so cGo.cu

然后我创建了一个包含在我的cgo中的头文件

FILE:cGo.h

typedef unsigned long int ktype;
typedef unsigned char glob;

/*
function Declarations
*/

void kernel_kValid(int , int , ktype *, glob *);

然后从go包中我利用cgo调用我的内核包装器

package cuda
/*
#cgo LDFLAGS: -LC:/Storage/Cuda/lib/x64 -lcudart //this is the Cuda library
#cgo LDFLAGS: -L${SRCDIR}/lib -lmyLib //this is my shared library
#cgo CPPFLAGS: -IC:/Storage/Cuda/include //this contains cuda headers
#cgo CPPFLAGS: -I${SRCDIR}/include //this contains cGo.h

#include <cuda_runtime.h>
#include <stdlib.h>
#include "cGo.h"
*/
import "C"

func useKernel(){
//other code
C.kernel_kValid(C.int(B), C.int(T), unsafe.Pointer(storageDevice), unsafe.Pointer(globDevice))
cudaErr, err = C.cudaDeviceSynchronize()
//rest of the code
}

因此,对CUDA运行时API的所有调用都不会抛出错误,它只是我的内核包装器。这是我用go构建cuda包时的输出。

C:UsersuserDocumentsReposgocuda_wrapper>go build cuda_wrappercuda
# cuda_wrapper/cuda
In file included from C:/Storage/Cuda/include/host_defines.h:50:0,
                 from C:/Storage/Cuda/include/device_types.h:53,
                 from C:/Storage/Cuda/include/builtin_types.h:56,
                 from C:/Storage/Cuda/include/cuda_runtime.h:86,
                 from C:Goworkspacesrccuda_wrappercudacuda.go:12:
C:/Storage/Cuda/include/crt/host_defines.h:84:0: warning: "__cdecl" redefined
 #define __cdecl

<built-in>: note: this is the location of the previous definition
# cuda_wrapper/cuda
C:UsersuserAppDataLocalTempgo-build038297194cuda_wrappercuda\_objcuda.cgo2.o: In function `_cgo_440ebb0a3e25_Cfunc_kernel_kValid':
/tmp/go-buildcuda_wrappercuda\_obj/cgo-gcc-prolog:306: undefined reference to `kernel_kValid'
collect2.exe: error: ld returned 1 exit status

就在这里,我不确定是什么问题。我一直在查询有关cgo未定义引用的问题,但我发现的任何问题都没有解决我的问题。我一直在研究CUDA运行时API是用C ++编写的,如果这会影响cgo如何编译它,但我还没有找到任何结论。在这一点上,我认为我比其他任何事情更困惑,所以我希望有更多知识渊博的人能指出我正确的方向。

答案

很好地抓住了名字的问题。

这是我们用于gorgonia的解决方案:

#include <math.h>

#ifdef __cplusplus
extern "C" {
#endif


__global__ void sigmoid32(float* A, int size)
{
    int blockId = blockIdx.x + blockIdx.y * gridDim.x + gridDim.x * gridDim.y * blockIdx.z;
    int idx = blockId * (blockDim.x * blockDim.y * blockDim.z) + (threadIdx.z * (blockDim.x * blockDim.y)) + (threadIdx.y * blockDim.x) + threadIdx.x;
    if (idx >= size) {
        return;
    }
    A[idx] = 1 / (1 + powf((float)(M_E), (-1 * A[idx])));
}

#ifdef __cplusplus
}
#endif

所以......只需将你的内核包装函数包装在extern "C"

以上是关于CUDA内核包装器的共享库未定义引用的主要内容,如果未能解决你的问题,请参考以下文章

使用共享内存时不执行 CUDA 内核代码

FragmentContainer 与仅作为包装器的另一个 React 组件?

具有动态共享内存的模板化 CUDA 内核

银行冲突CUDA共享内存?

使用 Nsight Systems 跟踪自定义 CUDA 内核

库未正确导出功能