尽管找到了 cuda,但 CMAKE_CUDA_COMPILER 标志为假

Posted

技术标签:

【中文标题】尽管找到了 cuda,但 CMAKE_CUDA_COMPILER 标志为假【英文标题】:CMAKE_CUDA_COMPILER flag is false despite cuda being found 【发布时间】:2021-10-27 05:16:09 【问题描述】:

我的 cmake 似乎无法找到 cuda 编译器。将 cuda 作为包查找是成功的,但是 CMAKE_CUDA_COMPILER 设置为 false。以下输出由 cmake 生成:

-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build type not specified, using Release
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - NOTFOUND
-- CUDA Support disabled.
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda-10.2 (found suitable exact version "10.2") 
-- Found GDAL: /usr/lib/libgdal.so (found version "2.2.3") 
-- Found Boost: /usr/include (found version "1.65.1") found components:  filesystem system 
-- Found EXIV2: /usr/lib/x86_64-linux-gnu/libexiv2.so  
-- Checking for module 'eigen3'
--   Found eigen3, version 3.3.9
-- Found eigen: /usr/local/include/eigen3  
-- Found Boost: /usr/include (found suitable version "1.65.1", minimum required is "1.40.0") found components:  system filesystem thread date_time iostreams serialization chrono atomic regex 
-- Checking for module 'libopenni'
--   Found libopenni, version 1.5.4.0
-- Found openni: /usr/lib/libOpenNI.so  
-- Checking for module 'libopenni2'
--   Found libopenni2, version 2.2.0.3
-- Found OpenNI2: /usr/lib/libOpenNI2.so  
-- Could NOT find ensenso (missing: ENSENSO_LIBRARY ENSENSO_INCLUDE_DIR) 
** WARNING ** io features related to ensenso will be disabled
-- Could NOT find DAVIDSDK (missing: DAVIDSDK_LIBRARY DAVIDSDK_INCLUDE_DIR) 
** WARNING ** io features related to davidSDK will be disabled
-- Could NOT find DSSDK (missing: _DSSDK_LIBRARIES) 
** WARNING ** io features related to dssdk will be disabled
** WARNING ** io features related to pcap will be disabled
** WARNING ** io features related to png will be disabled
-- Found Boost: /usr/include (found version "1.65.1") found components:  system filesystem 
-- CUDA not found. Skipping PSL package...
-- Found OpenCV: /usr/local (found version "3.3.1") 
-- Found Eigen3: /usr/local/include/eigen3 (Required is at least version "2.91.0") 
-- Found OpenGL: /usr/lib/x86_64-linux-gnu/libOpenGL.so   
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Configuring done
-- Generating done

相关行是:

-- Looking for a CUDA compiler - NOTFOUND
-- CUDA Support disabled.
-- Found CUDA: /usr/local/cuda-10.2 (found suitable exact version "10.2") 

我在 CMake 3.15.7 中用于检查 cuda 的代码:

check_language(CUDA)
if(CMAKE_CUDA_COMPILER)
    enable_language(CUDA)
    message(STATUS "CUDA Support enabled.")

    include(FindCUDA)
    set(CUDA_ARCH_LIST Auto CACHE STRING  "List of CUDA architectures (e.g. Pascal, Volta, etc) or \
                                           compute capability versions (6.1, 7.0, etc) to generate code for. \
                                           Set to Auto for automatic detection (default).")
    cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS $CUDA_ARCH_LIST)
    list(APPEND CUDA_NVCC_FLAGS $CUDA_ARCH_FLAGS)
else()
    message(STATUS "CUDA Support disabled.")
endif()

发生了什么事?我发誓它在某个时候有效。但现在已经不是了。 nvcc --versionnvidia-smi 给出合理的输出。我的 .bashrc 看起来像这样:

export CPATH=/usr/local/cuda-10.2/include:$CPATH
export PATH=/usr/local/cuda-10.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64$LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH

【问题讨论】:

1.输出与您的 CMakeLists.txt 文件不对应。例如,不查找就无法找到 Boost。 2. 获取 CMake 3.21 。当通过下载二进制文件可以轻松使用新版本时,没有理由使用旧版本。它们对您系统的特定特性非常宽容,您通常不需要自己构建它们。 这当然不是整个 CMakeLists.txt。只有与 cuda 查找相关的部分。但我会签出 3.21。谢谢! 所以请提供一个实际的 CMakeLists.txt 文件和您得到的实际输出,否则我们将无法验证您看到的内容。 @einpoklum 该文件很长,其余的并不重要...无论如何,我想我找到了问题所在。我安装了带有一个 gcc 版本的 cuda 10.2,并在稍后更改了 gcc 版本。这个版本似乎引起了问题! 我的意思是,保留一个短文件并提供其输出... 【参考方案1】:

好的,我找到了问题。每个 CUDA 版本只支持特定的 gcc 版本。可在此处找到兼容性:https://***.com/a/46380601/9299366

在安装 CUDA 之后,我更改了默认的 gcc 版本,导致尽管 nvcc 和 nvidia-smi 正常工作,但 cmake 无法检测到 cuda 的这种奇怪行为。我彻底清除了 CUDA 和 Nvidia 驱动程序,设置了有效的 gcc 版本并重新安装了 CUDA + 驱动程序。它现在似乎工作正常。

【讨论】:

以上是关于尽管找到了 cuda,但 CMAKE_CUDA_COMPILER 标志为假的主要内容,如果未能解决你的问题,请参考以下文章

仅分配第一个 gpu(尽管我在 pytorch cuda 框架中使其他 gpu 可见)

将主机内存复制到 cuda __device__ 变量

tensorflow 和 torch.cuda 可以找到 GPU,但 Keras 不能

尽管设置了 rpath,但没有找到一个 dll

尽管 deviceQuery 测试通过,CUDA 程序仍无法正确执行

尽管在类中声明了变量,但无法在范围内找到