OpenCL无法使用OpenCV检测我的AMD GPU

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了OpenCL无法使用OpenCV检测我的AMD GPU相关的知识,希望对你有一定的参考价值。

我正在使用AMD Radeon R9 M375。我试着按照这个答案https://stackoverflow.com/a/34250412/8731839,但它对我不起作用。

我跟着这个:http://answers.opencv.org/question/108646/opencl-can-not-detect-my-nvidia-gpu-via-opencv/?answer=108784#post-id-108784

这是我从clinfo.exe输出的

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               2
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    AMD Radeon (TM) R9 M375
  Device Topology:               PCI[ B#4, D#0, F#0 ]
  Max compute units:                 10
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1015Mhz
  Address bits:                  32
  Max memory allocation:             3019898880
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                3221225472
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Max pipe arguments:                0
  Max pipe active reservations:          0
  Max pipe packet size:              0
  Max global variable size:          0
  Max global variable preferred total size:  0
  Max read/write image args:             0
  Max on device events:              0
  Queue on device max size:          0
  Max on device queues:              0
  Queue on device preferred size:        0
  SVM capabilities:              
    Coarse grain buffer:             No
    Fine grain buffer:               No
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                No
    Profiling :                  No
  Platform ID:                   00007FFF209D0188
  Name:                      Capeverde
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                2348.3
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (2348.3)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics 

cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing 

cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing 

cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash 


      Device Type:                   CL_DEVICE_TYPE_CPU
      Vendor ID:                     1002h
      Board name:                    
      Max compute units:                 4
      Max work items dimensions:             3
        Max work items[0]:               1024
        Max work items[1]:               1024
        Max work items[2]:               1024
      Max work group size:               1024
      Preferred vector width char:           16
      Preferred vector width short:          8
      Preferred vector width int:            4
      Preferred vector width long:           2
      Preferred vector width float:          8
      Preferred vector width double:         4
      Native vector width char:          16
      Native vector width short:             8
      Native vector width int:           4
      Native vector width long:          2
      Native vector width float:             8
      Native vector width double:            4
      Max clock frequency:               2200Mhz
      Address bits:                  64
      Max memory allocation:             2147483648
      Image support:                 Yes
      Max number of images read arguments:       128
      Max number of images write arguments:      64
      Max image 2D width:                8192
      Max image 2D height:               8192
      Max image 3D width:                2048
      Max image 3D height:               2048
      Max image 3D depth:                2048
      Max samplers within kernel:            16
      Max size of kernel argument:           4096
      Alignment (bits) of base address:      1024
      Minimum alignment (bytes) for any datatype:    128
      Single precision floating point capability
        Denorms:                     Yes
        Quiet NaNs:                  Yes
        Round to nearest even:           Yes
        Round to zero:               Yes
        Round to +ve and infinity:           Yes
        IEEE754-2008 fused multiply-add:         Yes
      Cache type:                    Read/Write
      Cache line size:               64
      Cache size:                    32768
      Global memory size:                8499593216
      Constant buffer size:              65536
      Max number of constant args:           8
      Local memory type:                 Global
      Local memory size:                 32768
      Max pipe arguments:                16
      Max pipe active reservations:          16
      Max pipe packet size:              2147483648
      Max global variable size:          1879048192
      Max global variable preferred total size:  1879048192
      Max read/write image args:             64
      Max on device events:              0
      Queue on device max size:          0
      Max on device queues:              0
      Queue on device preferred size:        0
      SVM capabilities:              
        Coarse grain buffer:             No
        Fine grain buffer:               No
        Fine grain system:               No
        Atomics:                     No
      Preferred platform atomic alignment:       0
      Preferred global atomic alignment:         0
      Preferred local atomic alignment:      0
      Kernel Preferred work group size multiple:     1
      Error correction support:          0
      Unified memory for Host and Device:        1
      Profiling timer resolution:            465
      Device endianess:              Little
      Available:                     Yes
      Compiler available:                Yes
      Execution capabilities:                
        Execute OpenCL kernels:          Yes
        Execute native function:             Yes
      Queue on Host properties:              
        Out-of-Order:                No
        Profiling :                  Yes
      Queue on Device properties:                
        Out-of-Order:                No
        Profiling :                  No
      Platform ID:                   00007FFF209D0188
      Name:                      Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
      Vendor:                    GenuineIntel
      Device OpenCL C version:           OpenCL C 1.2 
      Driver version:                2348.3 (sse2,avx)
      Profile:                   FULL_PROFILE
      Version:                   OpenCL 1.2 AMD-APP (2348.3)

什么有效:

std::vector<cv::ocl::PlatformInfo> platforms;
cv::ocl::getPlatfomsInfo(platforms);

//OpenCL Platforms
for (size_t i = 0; i < platforms.size(); i++)
{

    //Access to Platform
    const cv::ocl::PlatformInfo* platform = &platforms[i];

    //Platform Name
    std::cout << "Platform Name: " << platform->name().c_str() << "
";
    //Access Device within Platform
    cv::ocl::Device current_device;
    for (int j = 0; j < platform->deviceNumber(); j++)
    {
        //Access Device
        platform->getDevice(current_device, j);
        //Device Type
        int deviceType = current_device.type();
        cout << "Device Number: " << platform->deviceNumber() << endl;
        cout << "Device Type: " << deviceType << endl;
    }
}

上面的代码显示

 Platform Name: Intel(R) OpenCL
 Device Number: 2
 Device Type: 2
 Device Number: 2
 Device Type: 4 
 Platform Name: AMD Accelerated Parallel Processing
 Device Number: 2
 Device Type: 4 
 Device Number: 2
 Device Type: 2 

如何使用AMD作为我的GPU来制作上下文?链接的帖子说使用方法initializeContextFromHandlerbut OpenCV上的文档是不够的。 Documentation Link

答案

问题已解决。我不知道我做了什么,但AMD现在正在努力。

当前设置(在Windows上):

  1. 环境变量: Name: OPENCV_OPENCL_DEVICE Value: AMD:GPU:Capeverde
  2. 使用setUseOpenCL(bool foo)中的ocl.hpp来选择是使用GPU还是CPU。

最有可能的问题:在我的实际代码中,我没有进行任何计算,但是当我编写一个简单的代码来减去两个矩阵时,AMD就开始工作了。

码:

#include <opencv2/core/ocl.hpp>
#include <opencv2/opencv.hpp>

int main() {
    cv::UMat mat1 = cv::UMat::ones(10, 10, CV_32F);
    cv::UMat mat2 = cv::UMat::zeros(10, 10, CV_32F);
    cv::UMat output = cv::UMat(10, 10, CV_32F);
    cv::subtract(mat1, mat2, output);
    std::cout << output << "
";
    std::getchar();
}

以上是关于OpenCL无法使用OpenCV检测我的AMD GPU的主要内容,如果未能解决你的问题,请参考以下文章

OpenCL / AMD:深度学习 [关闭]

使用 Nvidia 显卡安装 AMD OpenCL CPU 驱动程序

openCV 3.0、openCL 和 meanShiftFiltering

OpenCL AMD S10000 双 GPU 执行

Nvidia 和 AMD 硬件上的 OpenCL FFT?

Cloo 中的 OpenCL 扩展