将 c++ 库包含到 openCL 内核中?

Posted

技术标签:

【中文标题】将 c++ 库包含到 openCL 内核中?【英文标题】:include c++ libraries into openCL kernel? 【发布时间】:2020-05-24 02:56:00 【问题描述】:

是否可以在 openCL 内核中使用 C++ 样式库?

我正在尝试实现一个内核来执行以下代码中的任务。有两件事可能会让这变得非常困难:1. 我正在使用 GLM 数学库,以及 2. 我正在使用结构 (land_map_t)。

例如,如果我想使用内核循环一个大型 3 维数组,是否可以在内核中包含 GLM 数学库并利用其功能,例如 glm::simplex?我听说现代 C++ 功能(例如类)与内核不兼容。

如果这不可能,如何将结构传递给内核?我应该在内核和我的实现中定义相同的结构吗? struct 包含的所有内容都是一个 3 维数组,因此如果需要,我可以轻松地将其转换为默认 C++ 类型。


land_map_t * Chunk::terrain_gen(glm::ivec3 pos)


    float frequency = 500;
    float noise_1;

    land_map_t* landmap = new land_map_t;

    for (int x = 0; x < chunkSize + 2; x++) 
        for (int y = 0; y < chunkSize + 2; y++) 
            for (int z = 0; z < chunkSize + 2; z++) 

                noise_1 = (glm::simplex(
                    glm::vec2(glm::ivec2(x, z) + glm::ivec2(pos.x, pos.z)) / frequency));

                landmap->i[x][y][z] = BLOCK::AIR;
                if (pow(noise_1, 2) * 40.0 + 6.0 > (y + pos.y))
                
                    landmap->i[x][y][z] = BLOCK::DIRT;
                
            
        
    

    return landmap;



【问题讨论】:

您不能从 OpenCL 内核调用本机库。您可能会发现 SYCL 更适合您的问题:您可以使用一些本机库,并且可以将结构传递给只定义一次的内核。 【参考方案1】:

您不能在 OpenCL C 中包含 C++ 库。OpenCL 是 C99,而不是 C++。 OpenCL 中没有类,只有一维数组。在内核中,new 运算符也无法进行动态内存分配。

最好的解决方案是将类组件拆分为数组,并在每个数组中使用线性索引从大小为(Lx,Ly,Lz) 的矩形框中的(x, y, z)=(n%(Lx*Ly)%Lx, n%(Lx*Ly)/Lx, n/(Lx*Ly)) 到线性索引n=x+(y+z*Ly)*Lx; 并返回。

您在 OpenCL 中的代码可能如下所示:

kernel void terrain_gen(global uchar* landmap_flags, global float3* pos)
    const uint n = get_global_id(0);
    const uint x = n%((chunkSize+2)*(chunkSize+2))%(chunkSize+2);
    const uint y = n%((chunkSize+2)*(chunkSize+2))/(chunkSize+2);
    const uint z = n/((chunkSize+2)*(chunkSize+2))

    // paste the SimplexNoise struct definition here
    SimplexNoise simplexnoise;
    simplexnoise.initialize();

    const float frequency = 500;
    const float noise_1 = (simplexnoise.noise(x,z)+simplexnoise.noise(pos[n].x, pos[n].z))/ frequency;
    landmap_flags[n] = (noise_1*noise_1*40.0f+6.0f>(y+pos[n].y)) ? BLOCK_DIRT : BLOCK_AIR;


关于 GLM,您必须将所需的功能移植到 OpenCL C 中。对于单纯形噪声,您可以使用以下内容:

struct SimplexNoise  // simplex noise in 2D, sources: https://gist.github.com/Ellpeck/3df75965a542e2163d1ae9cf3e4777bb, https://github.com/stegu/perlin-noise/tree/master/src
    const float3 grad3[12] = 
        (float3)( 1, 1, 0), (float3)(-1, 1, 0), (float3)( 1,-1, 0), (float3)(-1,-1, 0),
        (float3)( 1, 0, 1), (float3)(-1, 0, 1), (float3)( 1, 0,-1), (float3)(-1, 0,-1),
        (float3)( 0, 1, 1), (float3)( 0,-1, 1), (float3)( 0, 1,-1), (float3)( 0,-1,-1)
    ;
    const uchar p[256] = 
        151,160,137, 91, 90, 15,131, 13,201, 95, 96, 53,194,233,  7,225,140, 36,103, 30, 69,142,  8, 99, 37,240, 21, 10, 23,190,  6,148,
        247,120,234, 75,  0, 26,197, 62, 94,252,219,203,117, 35, 11, 32, 57,177, 33, 88,237,149, 56, 87,174, 20,125,136,171,168, 68,175,
         74,165, 71,134,139, 48, 27,166, 77,146,158,231, 83,111,229,122, 60,211,133,230,220,105, 92, 41, 55, 46,245, 40,244,102,143, 54,
         65, 25, 63,161,  1,216, 80, 73,209, 76,132,187,208, 89, 18,169,200,196,135,130,116,188,159, 86,164,100,109,198,173,186,  3, 64,
         52,217,226,250,124,123,  5,202, 38,147,118,126,255, 82, 85,212,207,206, 59,227, 47, 16, 58, 17,182,189, 28, 42,223,183,170,213,
        119,248,152,  2, 44,154,163, 70,221,153,101,155,167, 43,172,  9,129, 22, 39,253, 19, 98,108,110,79,113,224,232,178,185, 112,104,
        218,246, 97,228,251, 34,242,193,238,210,144, 12,191,179,162,241, 81, 51,145,235,249, 14,239,107, 49,192,214, 31,181,199,106,157,
        184, 84,204,176,115,121, 50, 45,127,  4,150,254,138,236,205, 93,222,114, 67, 29, 24, 72,243,141,128,195, 78, 66,215, 61,156,180
    ;
    const float F2=0.5f*(sqrt(3.0f)-1.0f), G2=(3.0f-sqrt(3.0f))/6.0f; // skewing and unskewing factors for 2, 3, and 4 dimensions
    const float F3=1.0f/3.0f, G3=1.0f/6.0f;
    const float F4=(sqrt(5.0f)-1.0f)*0.25f, G4=(5.0f-sqrt(5.0f))*0.05f;
    uchar perm[512]; // to remove the need for index wrapping, double the permutation table length
    uchar perm12[512];
    //int floor(const float x) const  return (int)x-(x<=0.0f); 
    float dot(const float3 g, const float x, const float y) const  return g.x*x+g.y*y; 
    void initialize() 
        for(int i=0; i<512; i++) 
            perm[i] = p[i&255];
            perm12[i] = (uchar)(perm[i]%12);
        
    
    float noise(float x, float y) const  // 2D simplex noise
        float n0, n1, n2; // noise contributions from the three corners, skew the input space to determine simplex cell
        float s = (x+y)*F2; // hairy factor for 2D
        int i=floor(x+s), j=floor(y+s);
        float t = (i+j)*G2;
        float X0=i-t, Y0=j-t; // unskew the cell origin back to (x,y) space
        float x0=x-X0, y0=y-Y0; // the x,y distances from the cell origin
        // for the 2D case, the simplex shape is an equilateral triangle, determine simplex
        int i1, j1; // offsets for second (middle) corner of simplex in (i,j) coords
        if(x0>y0)  i1=1; j1=0;  // lower triangle, XY order: (0,0)->(1,0)->(1,1)
        else /**/  i1=0; j1=1;  // upper triangle, YX order: (0,0)->(0,1)->(1,1)
        float x1=x0-  i1+     G2, y1=y0-  j1+     G2; // offsets for middle corner in (x,y) unskewed coords
        float x2=x0-1.0f+2.0f*G2, y2=y0-1.0f+2.0f*G2; // offsets for last corner in (x,y) unskewed coords
        int ii=i&255, jj=j&255; // work out the hashed gradient indices of the three simplex corners
        int gi0 = perm12[ii   +perm[jj   ]];
        int gi1 = perm12[ii+i1+perm[jj+j1]];
        int gi2 = perm12[ii+ 1+perm[jj+ 1]];
        float t0 = 0.5f-x0*x0-y0*y0; // calculate the contribution from the three corners
        if(t0<0) n0 = 0.0f; else  t0 *= t0; n0 = t0*t0*dot(grad3[gi0], x0, y0);  // (x,y) of grad3 used for 2D gradient
        float t1 = 0.5f-x1*x1-y1*y1;
        if(t1<0) n1 = 0.0f; else  t1 *= t1; n1 = t1*t1*dot(grad3[gi1], x1, y1); 
        float t2 = 0.5f-x2*x2-y2*y2;
        if(t2<0) n2 = 0.0f; else  t2 *= t2; n2 = t2*t2*dot(grad3[gi2], x2, y2); 
        return 70.0f*(n0+n1+n2); // add contributions from each corner to get the final noise value, result is scaled to stay inside [-1,1]
    
;

【讨论】:

以上是关于将 c++ 库包含到 openCL 内核中?的主要内容,如果未能解决你的问题,请参考以下文章

如何正确初始化此 C++ for openCL 内核的输入/输出参数?

具有临时消除功能的 C++ OpenCL 矩阵库

OpenCL C/C++ 动态绑定库(win32 等)

OpenCL 内核的定时执行

Declspec A Struct,在创建缓冲区以发送到 OpenCL 内核时

将条件作为参数传递给 OpenCL 内核