在 WSL2 上使用 Cuda 让我“没有可在设备上执行的内核映像”。
Posted
技术标签:
【中文标题】在 WSL2 上使用 Cuda 让我“没有可在设备上执行的内核映像”。【英文标题】:Using Cuda on WSL2 gives me "no kernel image is available for execution on the device." 【发布时间】:2021-09-17 16:02:20 【问题描述】:我正在尝试在 WSL2 上的 C++ 程序中使用 Cuda 和 Thrust。我按照here 中的说明在 WSL2 上启用了 Cuda。这是一个小示例程序:
首先,我定义:
export CUDA_LIBRARY_DIRECTORY=/usr/local/cuda-11.0/lib64
export CUDA_INCLUDE_DIRECTORY=/usr/local/cuda-11.0/include
export CUDACXX=/usr/local/cuda-11.0/bin/nvcc
CMakeLists.txt
cmake_minimum_required(VERSION 2.8)
project(proj LANGUAGES CXX CUDA)
set (CMAKE_CXX_STANDARD 14)
#### use cuda ####
set(CUDA_NVCC_FLAGS $CUDA_NVCC_FLAGS;-gencode arch=compute_50,code=sm_50;-lineinfo; -cudart=static; -Xptxas; -v)
include_directories($ENVCUDA_INCLUDE_DIRECTORY)
link_directories($ENVCUDA_LIBRARY_DIRECTORY)
ADD_EXECUTABLE(
proj
src/cudafile.cu
src/main.cpp)
main.cpp
#include<thrust/host_vector.h>
#include<thrust/device_vector.h>
#include<thrust/device_ptr.h>
void func(int size, int* a1, int* a2, int* a3);
void FillWithValue(int* arr, int size, int val);
int main()
int size=1000;
int *arr1, *arr2, *arr3;
cudaMalloc((void**)&arr1, size * sizeof(int));
FillWithValue(arr1,size,1);
cudaMalloc((void**)&arr2, size * sizeof(int));
FillWithValue(arr2,size,2);
cudaMalloc((void**)&arr3, size * sizeof(int));
int* harr = new int [size];
cudaMemcpy(harr,arr1,size*sizeof(int),cudaMemcpyDeviceToHost);
fprintf(stdout, "%d\n",harr[0]);
func(size, arr1, arr2, arr3);
cudaError_t err = cudaGetLastError();
if (cudaSuccess != err)
fprintf(stderr, "Cuda error: %s.\n", cudaGetErrorString(err));
return 1;
cudafile.cu
#include<thrust/host_vector.h>
#include<thrust/device_vector.h>
#include<thrust/device_ptr.h>
#define blocksize 512
#define maxblocks 65535
__global__ void funcKernel(int size, int* a1, int* a2, int* a3)
int i = blockIdx.x * blockDim.x + threadIdx.x;
while (i < size)
a3[i]=a1[i]+a2[i];
void func(int size, int* a1, int* a2, int* a3)
int gridsize = size / blocksize + 1;
if (gridsize > maxblocks) gridsize = maxblocks;
funcKernel << <gridsize, blocksize >> > (size, a1, a2, a3);
void FillWithValue(int* arr, int size, int val)
thrust::device_ptr<int> d = thrust::device_pointer_cast(arr);
thrust::fill(d, d + size, val);
输出
0
Cuda error: no kernel image is available for execution on the device.
现在,第一个 fprintf 的输出证明 Thrust 填充函数无法填充数组,并且 cudaGetLastError() 捕获错误,证明内核也失败了。
这是详细的 cmake 构建:
cmake ..
-- The CXX compiler identification is GNU 9.3.0
-- The CUDA compiler identification is NVIDIA 11.0.221
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda-11.0/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-11.0/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/d/work/wsl2-projects/tests/kernels/build
制作
/usr/bin/cmake -S/mnt/d/work/wsl2-projects/tests/kernels -B/mnt/d/work/wsl2-projects/tests/kernels/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /mnt/d/work/wsl2-projects/tests/kernels/build/CMakeFiles /mnt/d/work/wsl2-projects/tests/kernels/build/CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/mnt/d/work/wsl2-projects/tests/kernels/build'
make -f CMakeFiles/proj.dir/build.make CMakeFiles/proj.dir/depend
make[2]: Entering directory '/mnt/d/work/wsl2-projects/tests/kernels/build'
cd /mnt/d/work/wsl2-projects/tests/kernels/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /mnt/d/work/wsl2-projects/tests/kernels /mnt/d/work/wsl2-projects/tests/kernels /mnt/d/work/wsl2-projects/tests/kernels/build /mnt/d/work/wsl2-projects/tests/kernels/build /mnt/d/work/wsl2-projects/tests/kernels/build/CMakeFiles/proj.dir/DependInfo.cmake --color=
Scanning dependencies of target proj
make[2]: Leaving directory '/mnt/d/work/wsl2-projects/tests/kernels/build'
make -f CMakeFiles/proj.dir/build.make CMakeFiles/proj.dir/build
make[2]: Entering directory '/mnt/d/work/wsl2-projects/tests/kernels/build'
[ 33%] Building CUDA object CMakeFiles/proj.dir/src/cudafile.cu.o
/usr/local/cuda-11.0/bin/nvcc -x cu -c /mnt/d/work/wsl2-projects/tests/kernels/src/cudafile.cu -o CMakeFiles/proj.dir/src/cudafile.cu.o
[ 66%] Building CXX object CMakeFiles/proj.dir/src/main.cpp.o
/usr/bin/c++ -I/usr/local/cuda-11.0/include -std=gnu++14 -o CMakeFiles/proj.dir/src/main.cpp.o -c /mnt/d/work/wsl2-projects/tests/kernels/src/main.cpp
[100%] Linking CXX executable proj
/usr/bin/cmake -E cmake_link_script CMakeFiles/proj.dir/link.txt --verbose=1
/usr/bin/c++ -rdynamic CMakeFiles/proj.dir/src/cudafile.cu.o CMakeFiles/proj.dir/src/main.cpp.o -o proj -L/usr/local/cuda-11.0/lib64 -L/usr/local/cuda-11.0/targets/x86_64-linux/lib/stubs -L/usr/local/cuda-11.0/targets/x86_64-linux/lib -lcudadevrt -lcudart_static -lrt -lpthread -ldl
make[2]: Leaving directory '/mnt/d/work/wsl2-projects/tests/kernels/build'
[100%] Built target proj
make[1]: Leaving directory '/mnt/d/work/wsl2-projects/tests/kernels/build'
/usr/bin/cmake -E cmake_progress_start /mnt/d/work/wsl2-projects/tests/kernels/build/CMakeFiles 0
这与我的 GPU 与 Cuda 版本不匹配有关吗?想过降级到Cuda 10或者9,但是不知道怎么安装,和here一模一样,这样就不会用另一个Nvidia驱动替换驱动了。
附加信息:
GeForce GTX 950M Windows 11 家庭版。构建 22000.51。 WSL2:Ubuntu-20.04 Cuda 编译工具,9.1 版,V9.1.85【问题讨论】:
您可能希望提供 cmake 构建的详细输出。 @RobertCrovella 好的。我编辑了这个问题。我希望这是你所要求的。 您在CMakeLists.txt
文件中指定的CUDA_NVCC_FLAGS
在使用nvcc
编译期间未得到应用。最重要的是,您指定了这个编译开关:-gencode arch=compute_50,code=sm_50
,这对您的 GPU 是正确的,但它没有在您的 cmake 详细输出中使用。这就是问题。 CUDA 11 默认编译为 5.2 的计算能力,如果你保持这种方式,这些代码将无法在你所拥有的 cc5.0 GPU 上运行。这不是 CUDA 或 wsl2 的问题,而是您使用 cmake 的问题。
见here
@RobertCrovella 它有效。非常感谢 !我设置了 CMAKE_CUDA_FLAGS 而不是 CUDA_NVCC_FLAGS。
【参考方案1】:
根据 Robert Crovella 的评论,我设法让程序正常运行,输出正确且没有错误。
在 CMakeLists.txt 中,我使用了
set(CMAKE_CUDA_FLAGS "$CMAKE_CUDA_FLAGS -gencode arch=compute_50,code=sm_50 -lineinfo -cudart=static -Xptxas -v")
而不是
set(CUDA_NVCC_FLAGS $CUDA_NVCC_FLAGS;-gencode arch=compute_50,code=sm_50;-lineinfo; -cudart=static; -Xptxas; -v)
现在输出是
1
【讨论】:
以上是关于在 WSL2 上使用 Cuda 让我“没有可在设备上执行的内核映像”。的主要内容,如果未能解决你的问题,请参考以下文章
通过WSL2搭建Pytorch1.10+CUDA11.4+NVIDIA Driver深度学习框架
通过WSL2搭建Pytorch1.10+CUDA11.4+NVIDIA Driver深度学习框架