CUDA 退出代码 255，并且 ptxas 致命

Posted 2023-02-22

技术标签:

【中文标题】CUDA 退出代码 255，并且 ptxas 致命【英文标题】：CUDA exit code 255, and ptxas fatal 【发布时间】：2017-01-27 23:27:23 【问题描述】：

我在使用 CUDA 和 Visual Studio 时遇到了一些问题，当我运行 CUDA 调试器时，它会运行最后一次成功构建的代码，但是当我尝试再次构建项目时，它会输出：错误列表中的 "...\main.cu" exited with code 255，并且在输出选项卡中，基本上是这个错误：ptxas fatal : Unresolved extern function '_ZN7Vector4plERKS_'

我已经检查了其他问题，尝试用谷歌搜索它，在不同的网站上搜索，尝试打开可重定位设备，但它给出了相同的错误消息，除了 ptxas 但有这个：

Undefined reference to '_ZN7Vector4plERKS_' in 'x64/Debug/main.cu.obj'

另外，我使用的是 VS2015，只是为了确定：我应该首先构建项目并使用 CUDA 调试器运行它？不应该使用“本地 Windows 调试器”按钮，对吧？

无论如何，这是我的代码：

矢量.cuh

#pragma once

#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER __host__ __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif 

#include <iostream>
class Vector4

public:
    float x, y, z, w;
    CUDA_CALLABLE_MEMBER Vector4();
    CUDA_CALLABLE_MEMBER Vector4(float x, float y, float z, float w);
    CUDA_CALLABLE_MEMBER virtual ~Vector4();
    CUDA_CALLABLE_MEMBER void print();
    CUDA_CALLABLE_MEMBER Vector4 operator+(const Vector4& other);
    CUDA_CALLABLE_MEMBER void add(Vector4* other);
;

Vector.cu 的一部分

Vector4::Vector4(float x, float y, float z, float w)

    this->x = x;
    this->y = y;
    this->z = z;
    this->w = w;

Vector4 Vector4::operator+(const Vector4 & other)

    return Vector4( 
                    this->x + other.x,
                    this->y + other.y,
                    this->z + other.z,
                    this->w + other.w
                  );

main.cu

#include <iostream>
#include <cuda.h>
#include "cuda_runtime.h"
#include "Vector.cuh"
#include <SFML/Graphics.hpp>

__global__ void addVector(Vector4* a, Vector4* b)

    (*a) = (*a) + (*b);
    //a->x += 1;


int main()

    sf::RenderWindow window(sf::VideoMode(200, 200), "SFML works!");
    sf::CircleShape shape(100.f);
    shape.setFillColor(sf::Color::Green);

    int size = sizeof(Vector4);
    Vector4 v(1, 0, 0, 0);
    Vector4 b(1, 1, 0, 0);

    Vector4* d_v;
    Vector4* d_b;

    //cudaMalloc the device pointers
    //cudaMalloc(&pointer, bytes)
    cudaMalloc(&d_v, size);
    cudaMalloc(&d_b, size);

    while (window.isOpen())
    
        sf::Event event;
        while (window.pollEvent(event))
        
            if (event.type == sf::Event::Closed)
                window.close();
        


        //cudaMemcpy the pointers to actual host data
        //cudaMemcpy(to, from, bytes, cudaMemcpyHostToDevice)
        cudaMemcpy(d_v, &v, size, cudaMemcpyHostToDevice);
        cudaMemcpy(d_b, &b, size, cudaMemcpyHostToDevice);


        cudaError_t err = cudaGetLastError();
        HANDLE_ERROR(err);

        //call kernel with the new device data
        addVector << <1, 1 >> >(d_v, d_b);

        //cudaMemcpy back to the old host variables
        //cudaMemcpy(to, from, bytes, cudaMemcpyDeviceToHost)
        cudaMemcpy(&v, d_v, size, cudaMemcpyDeviceToHost);
        cudaMemcpy(&b, d_b, size, cudaMemcpyDeviceToHost);

        v.print();
        b.print();
        printf("\n\n");
        window.clear();
        window.draw(shape);
        window.display();
    


    //cudaFree
    cudaFree(d_v);
    cudaFree(d_b);
    getchar();

    return 0;

这也是项目设置中CUDA C++下的命令行代码：

set CUDAFE_FLAGS=--sdk_dir "C:\Program Files (x86)\Windows Kits\8.1\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" --use-local-env --cl-version 2015 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64"     -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g    -Xcompiler "/EHsc  /nologo  /FS /Zi   " -o x64\Debug\%(Filename)%(Extension).obj "%(FullPath)"

对不起，文字墙，谢谢！

编辑： 我正在使用 CUDA 8.0

【问题讨论】：

对于投反对票的人，我可以知道为什么吗？ 【参考方案1】：

这不应该编译。

在我能看到的任何地方都没有__device__ _ZN7Vector4plERKS_（即__device__ Vector4::operator+(Vector4 const&)）的定义。当你修复这个问题时，你要么需要启用单独的设备代码编译和链接，要么将 Vector.cu 导入 main.cu，因为操作符的设备代码与调用的内核没有在同一个翻译单元中定义它。

【讨论】：

简单地导入 Vector.cu 解决了这个问题，但随着我的项目变得越来越大，它并不理想。您能否对单独的设备代码编译部分有所了解？我找到了这个link，但是它没有用，因为我大多不知道要更改项目的哪些设置。

以上是关于CUDA 退出代码 255，并且 ptxas 致命的主要内容，如果未能解决你的问题，请参考以下文章