在 AMD 上写入非零 FBO 附件时,OpenGL 会降低性能

Posted

技术标签:

【中文标题】在 AMD 上写入非零 FBO 附件时,OpenGL 会降低性能【英文标题】:OpenGL drops performance when writing to nonzero FBO attachment on AMD 【发布时间】:2019-08-13 12:58:50 【问题描述】:

我注意到我的 3D 引擎在 AMD 硬件上运行非常缓慢。经过一番调查,慢代码归结为创建带有多个附件的 FBO 并写入任何非零附件。在所有测试中,我将 AMD 性能与相同的 AMD GPU 进行了比较,但写入不受影响的GL_COLOR_ATTACHMENT0,以及与我的 AMD 设备的性能差异众所周知的 Nvidia 硬件。

将片段写入非零附件比预期慢 2-3 倍。

此代码相当于我在测试应用中创建帧缓冲区和测量性能的方式:

    // Create a framebuffer
    static const auto attachmentCount = 6;
    GLuint fb, att[attachmentCount];
    glGenTextures(attachmentCount, att);
    glGenFramebuffers(1, &fb);
    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);

    for (auto i = 0; i < attachmentCount; ++i) 
        glBindTexture(GL_TEXTURE_2D, att[i]);
        glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
        glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, att[i], 0);
    
    GLuint dbs[] = 
        GL_NONE,
        GL_COLOR_ATTACHMENT1,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE;
    glDrawBuffers(attachmentCount, dbs);


    // Main loop
    while (shouldWork) 
        glClear(GL_COLOR_BUFFER_BIT);
        for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
        glfwSwapBuffers(window);
        glfwPollEvents();
        showFps();
    

有什么问题吗?

可以在here 找到完全可重复的最小测试。我尝试了许多其他书写模式或 OpenGL 状态,并在AMD Community 中描述了其中一些。

我认为问题出在 AMD 的 OpenGL 驱动程序中,但如果不是,或者您遇到了同样的问题并找到了解决方法(供应商扩展?),请分享。

UPD:在此处移动问题详情。

我准备了一个最小的测试包,其中应用程序创建了一个带有六个 RGBA UNSIGNED_BYTE 附件的 FBO,并为它呈现每帧 100 个全屏矩形。有四种可执行文件,四种写法:

    正在将着色器输出 0 写入附件 0。只有输出 0 使用 glDrawBuffers 路由到帧缓冲区。所有其他输出都设置为GL_NONE

    与 1 相同,但输出和附件为 1。

    正在将输出 0 写入附件 0,但所有六个着色器输出都分别路由到附件 0..6,并且除 0 之外的所有绘制缓冲区都被 glColorMaski 屏蔽。

    与 3 相同,但用于附件 1。

我在两台具有几乎相似 CPU 和以下 GPU 的机器上运行所有测试:

AMD Radeon RX550,驱动版本 19.30.01.16

Nvidia Geforce GTX 650 Ti,比 RX550 低约 2 倍

得到了这些结果:

Geforce GTX 650 Ti:
attachment0: 195 FPS
attachment1: 195 FPS
attachment0 masked: 195 FPS
attachment1 masked: 235 FPS
Radeon RX550:
attachment0: 350 FPS
attachment1: 185 FPS
attachment0 masked: 330 FPS
attachment1 masked: 175 FPS

预构建的测试可执行文件附在帖子中,或者可以从Google drive下载。

测试源代码(带有 MSVS 友好的 cmake 构建系统)可在 Github 上获得

所有四个程序都显示黑色窗口和带有 FPS 计数器的控制台。

我们看到,在写入非零附件时,AMD 比功能较弱的 nvidia GPU 和它本身慢得多。此外,drawbuffer 输出的全局屏蔽也会降低一些 fps。

我还尝试使用渲染缓冲区而不是纹理,使用其他图像格式(而测试中的格式是最兼容的格式),渲染到两倍大小的帧缓冲区。结果是一样的。

明确关闭剪刀、模板和深度测试没有帮助。

如果我通过将顶点坐标乘以小于 1 的值来减少附件数量或减少帧缓冲区覆盖率,则测试性能会按比例提高,最终 RX550 的性能优于 GTX 650 Ti。

glClear 调用也会受到影响,它们在各种条件下的表现符合上述观察结果。

我的队友使用原生 Linux 并使用 Wine 在 Radeon HD 3000 上启动了测试。两次测试运行都暴露了附件 0 和附件 1 测试之间的巨大差异。我无法确切知道他的驱动程序版本是什么,但它是由 Ubuntu 19.04 repos 提供的。

另一位队友在 Radeon RX590 上进行了测试,得到了相同的 2 倍差异。

最后,让我在这里复制粘贴两个几乎相同的测试示例。这个工作很快:

#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>

#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>

static std::string getErrorDescr(const GLenum errCode)

    // English descriptions are from
    // https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
    switch (errCode) 
        case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
        case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
        case GL_INVALID_VALUE: return "A numeric argument is out of range.";
        case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
        case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
        case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
        case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
        case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
        default:;
    
    return "No description available.";


static std::string getErrorMessage()

    const GLenum error = glGetError();
    if (GL_NO_ERROR == error) return "";

    std::stringstream ss;
    ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
    ss << "Error string: ";
    ss << getErrorDescr(error);
    ss << std::endl;
    return ss.str();


[[maybe_unused]] static bool error()

    const auto message = getErrorMessage();
    if (message.length() == 0) return false;
    std::cerr << message;
    return true;


static bool compileShader(const GLuint shader, const std::string& source)

    unsigned int linesCount = 0;
    for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
    const char** sourceLines = new const char*[linesCount];
    int* lengths = new int[linesCount];

    int idx = 0;
    const char* lineStart = source.data();
    int lineLength = 1;
    const auto len = source.length();
    for (unsigned int i = 0; i < len; ++i) 
        if (source[i] == '\n') 
            sourceLines[idx] = lineStart;
            lengths[idx] = lineLength;
            lineLength = 1;
            lineStart = source.data() + i + 1;
            ++idx;
        
        else ++lineLength;
    

    glShaderSource(shader, linesCount, sourceLines, lengths);
    glCompileShader(shader);
    GLint logLength;
    glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
    if (logLength > 0) 
        auto* const log = new GLchar[logLength + 1];
        glGetShaderInfoLog(shader, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    

    GLint compileStatus;
    glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
    delete[] sourceLines;
    delete[] lengths;
    return bool(compileStatus);


static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)

    const auto vs = glCreateShader(GL_VERTEX_SHADER);
    if (vs == 0) 
        std::cerr << "Error: vertex shader is 0." << std::endl;
        return 2;
    
    const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
    if (fs == 0) 
        std::cerr << "Error: fragment shader is 0." << std::endl;
        return 2;
    

    // Compile shaders
    if (!compileShader(vs, vertSource)) 
        std::cerr << "Error: could not compile vertex shader." << std::endl;
        return 5;
    
    if (!compileShader(fs, fragSource)) 
        std::cerr << "Error: could not compile fragment shader." << std::endl;
        return 5;
    

    // Link program
    const auto program = glCreateProgram();
    if (program == 0) 
        std::cerr << "Error: program is 0." << std::endl;
        return 2;
    
    glAttachShader(program, vs);
    glAttachShader(program, fs);
    glLinkProgram(program);

    // Get log
    GLint logLength = 0;
    glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);

    if (logLength > 0) 
        auto* const log = new GLchar[logLength + 1];
        glGetProgramInfoLog(program, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    
    GLint linkStatus = 0;
    glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
    if (!linkStatus) 
        std::cerr << "Error: could not link." << std::endl;
        return 2;
    
    glDeleteShader(vs);
    glDeleteShader(fs);
    return program;


static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()

    gl_Position = vec4(v, 0.0, 1.0);

)";

static const std::string fragSource = R"(
#version 330
layout(location = 0) out vec4 outColor0;
void main()

    outColor0 = vec4(0.5, 0.5, 0.5, 1.0);

)";

int main()

    // Init
    if (!glfwInit()) 
        std::cerr << "Error: glfw init failed." << std::endl;
        return 3;
    

    static const int width = 800;
    static const int height= 600;
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    GLFWwindow* window = nullptr;
    window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
    if (window == nullptr) 
        std::cerr << "Error: window is null." << std::endl;
        glfwTerminate();
        return 1;
    
    glfwMakeContextCurrent(window);

    if (glewInit() != GLEW_OK) 
        std::cerr << "Error: glew not OK." << std::endl;
        glfwTerminate();
        return 2;
    

    // Shader program
    const auto shaderProgram = createProgram(vertSource, fragSource);
    glUseProgram(shaderProgram);

    // Vertex buffer
    GLuint vao;
    glGenVertexArrays(1, &vao);
    glBindVertexArray(vao);

    GLuint buffer;
    glGenBuffers(1, &buffer);
    glBindBuffer(GL_ARRAY_BUFFER, buffer);
    float bufferData[] = 
        -1.0f, -1.0f,
        1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, 1.0f
    ;
    glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
    glEnableVertexAttribArray(0);
    glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));

    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);

    // Framebuffer
    GLuint fb, att[6];
    glGenTextures(6, att);
    glGenFramebuffers(1, &fb);

    glBindTexture(GL_TEXTURE_2D, att[0]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[1]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[2]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[3]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[4]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[5]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);

    GLuint dbs[] = 
        GL_COLOR_ATTACHMENT0,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE;
    glDrawBuffers(6, dbs);

    if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) 
        std::cerr << "Error: framebuffer is incomplete." << std::endl;
        return 1;
    
    if (error()) 
        std::cerr << "OpenGL error occured." << std::endl;
        return 2;
    

    // Fpsmeter
    static const uint32_t framesMax = 50;
    uint32_t framesCount = 0;
    auto start = std::chrono::steady_clock::now();

    // Main loop
    while (!glfwWindowShouldClose(window)) 
        if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);

        glClear(GL_COLOR_BUFFER_BIT);
        for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
        glfwSwapBuffers(window);
        glfwPollEvents();

        if (++framesCount == framesMax) 
            framesCount = 0;
            const auto now = std::chrono::steady_clock::now();
            const auto duration = now - start;
            start = now;
            const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
            std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
        
    

    // Shutdown
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    glBindVertexArray(vao);
    glUseProgram(0);
    glDeleteProgram(shaderProgram);
    glDeleteBuffers(1, &buffer);
    glDeleteVertexArrays(1, &vao);
    glDeleteFramebuffers(1, &fb);
    glDeleteTextures(6, att);
    glfwMakeContextCurrent(nullptr);
    glfwDestroyWindow(window);
    glfwTerminate();
    return 0;

这个例子在 Nvidia 和 Intel GPU 上的运行速度相当快,但比在 AMD GPU 上的第一个例子慢 2-3 倍:

#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>

#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>

static std::string getErrorDescr(const GLenum errCode)

    // English descriptions are from
    // https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
    switch (errCode) 
        case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
        case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
        case GL_INVALID_VALUE: return "A numeric argument is out of range.";
        case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
        case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
        case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
        case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
        case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
        default:;
    
    return "No description available.";


static std::string getErrorMessage()

    const GLenum error = glGetError();
    if (GL_NO_ERROR == error) return "";

    std::stringstream ss;
    ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
    ss << "Error string: ";
    ss << getErrorDescr(error);
    ss << std::endl;
    return ss.str();


[[maybe_unused]] static bool error()

    const auto message = getErrorMessage();
    if (message.length() == 0) return false;
    std::cerr << message;
    return true;


static bool compileShader(const GLuint shader, const std::string& source)

    unsigned int linesCount = 0;
    for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
    const char** sourceLines = new const char*[linesCount];
    int* lengths = new int[linesCount];

    int idx = 0;
    const char* lineStart = source.data();
    int lineLength = 1;
    const auto len = source.length();
    for (unsigned int i = 0; i < len; ++i) 
        if (source[i] == '\n') 
            sourceLines[idx] = lineStart;
            lengths[idx] = lineLength;
            lineLength = 1;
            lineStart = source.data() + i + 1;
            ++idx;
        
        else ++lineLength;
    

    glShaderSource(shader, linesCount, sourceLines, lengths);
    glCompileShader(shader);
    GLint logLength;
    glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
    if (logLength > 0) 
        auto* const log = new GLchar[logLength + 1];
        glGetShaderInfoLog(shader, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    

    GLint compileStatus;
    glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
    delete[] sourceLines;
    delete[] lengths;
    return bool(compileStatus);


static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)

    const auto vs = glCreateShader(GL_VERTEX_SHADER);
    if (vs == 0) 
        std::cerr << "Error: vertex shader is 0." << std::endl;
        return 2;
    
    const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
    if (fs == 0) 
        std::cerr << "Error: fragment shader is 0." << std::endl;
        return 2;
    

    // Compile shaders
    if (!compileShader(vs, vertSource)) 
        std::cerr << "Error: could not compile vertex shader." << std::endl;
        return 5;
    
    if (!compileShader(fs, fragSource)) 
        std::cerr << "Error: could not compile fragment shader." << std::endl;
        return 5;
    

    // Link program
    const auto program = glCreateProgram();
    if (program == 0) 
        std::cerr << "Error: program is 0." << std::endl;
        return 2;
    
    glAttachShader(program, vs);
    glAttachShader(program, fs);
    glLinkProgram(program);

    // Get log
    GLint logLength = 0;
    glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);

    if (logLength > 0) 
        auto* const log = new GLchar[logLength + 1];
        glGetProgramInfoLog(program, logLength, nullptr, log);
        std::cout << "Log: " << std::endl;
        std::cout << log;
        delete[] log;
    
    GLint linkStatus = 0;
    glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
    if (!linkStatus) 
        std::cerr << "Error: could not link." << std::endl;
        return 2;
    
    glDeleteShader(vs);
    glDeleteShader(fs);
    return program;


static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()

    gl_Position = vec4(v, 0.0, 1.0);

)";

static const std::string fragSource = R"(
#version 330
layout(location = 1) out vec4 outColor1;
void main()

    outColor1 = vec4(0.5, 0.5, 0.5, 1.0);

)";

int main()

    // Init
    if (!glfwInit()) 
        std::cerr << "Error: glfw init failed." << std::endl;
        return 3;
    

    static const int width = 800;
    static const int height= 600;
    glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
    glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
    glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
    GLFWwindow* window = nullptr;
    window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
    if (window == nullptr) 
        std::cerr << "Error: window is null." << std::endl;
        glfwTerminate();
        return 1;
    
    glfwMakeContextCurrent(window);

    if (glewInit() != GLEW_OK) 
        std::cerr << "Error: glew not OK." << std::endl;
        glfwTerminate();
        return 2;
    

    // Shader program
    const auto shaderProgram = createProgram(vertSource, fragSource);
    glUseProgram(shaderProgram);

    // Vertex buffer
    GLuint vao;
    glGenVertexArrays(1, &vao);
    glBindVertexArray(vao);

    GLuint buffer;
    glGenBuffers(1, &buffer);
    glBindBuffer(GL_ARRAY_BUFFER, buffer);
    float bufferData[] = 
        -1.0f, -1.0f,
        1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, -1.0f,
        1.0f, 1.0f,
        -1.0f, 1.0f
    ;
    glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
    glEnableVertexAttribArray(0);
    glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));

    glClearColor(0.0f, 0.0f, 0.0f, 0.0f);

    // Framebuffer
    GLuint fb, att[6];
    glGenTextures(6, att);
    glGenFramebuffers(1, &fb);

    glBindTexture(GL_TEXTURE_2D, att[0]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[1]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[2]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[3]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[4]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glBindTexture(GL_TEXTURE_2D, att[5]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

    glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);

    GLuint dbs[] = 
        GL_NONE,
        GL_COLOR_ATTACHMENT1,
        GL_NONE,
        GL_NONE,
        GL_NONE,
        GL_NONE;
    glDrawBuffers(6, dbs);

    if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)) 
        std::cerr << "Error: framebuffer is incomplete." << std::endl;
        return 1;
    
    if (error()) 
        std::cerr << "OpenGL error occured." << std::endl;
        return 2;
    

    // Fpsmeter
    static const uint32_t framesMax = 50;
    uint32_t framesCount = 0;
    auto start = std::chrono::steady_clock::now();

    // Main loop
    while (!glfwWindowShouldClose(window)) 
        if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);

        glClear(GL_COLOR_BUFFER_BIT);
        for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
        glfwSwapBuffers(window);
        glfwPollEvents();

        if (++framesCount == framesMax) 
            framesCount = 0;
            const auto now = std::chrono::steady_clock::now();
            const auto duration = now - start;
            start = now;
            const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
            std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
        
    

    // Shutdown
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    glBindVertexArray(vao);
    glUseProgram(0);
    glDeleteProgram(shaderProgram);
    glDeleteBuffers(1, &buffer);
    glDeleteVertexArrays(1, &vao);
    glDeleteFramebuffers(1, &fb);
    glDeleteTextures(6, att);
    glfwMakeContextCurrent(nullptr);
    glfwDestroyWindow(window);
    glfwTerminate();
    return 0;

这些示例之间的唯一区别是使用的颜色附件。

我故意编写了两个几乎相似的复制粘贴程序,以避免帧缓冲区删除和重新创建可能产生的不良影响。

UPD2:还在我在 Nvidia 和 AMD 上的测试示例中尝试了 OpenGL 4.6 调试上下文。没有性能警告。

UPD3: RX470 结果:

attachment0: 775 FPS
attachment1: 396 FPS

UPD4:我通过 emscripten 为 webgl 构建了 attachment0 和 attachment1 测试,并在 Radeon RX550 上运行它们。完整源代码在问题的Github repo,构建命令行是

emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html

两个测试程序都发出一个drawcall:glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);

第一个测试:默认配置的 Firefox,即支持 DirectX 的 ANGLE。

Unmasked Vendor:    Google Inc.
Unmasked Renderer:  ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)

attachment0: 38 FPS
attachment1: 38 FPS

第二次测试:禁用 ANGLE 的 Firefox (about:config -> webgl.disable-angle = true),使用原生 OpenGL:

Unmasked Vendor:    ATI Technologies Inc.
Unmasked Renderer:  Radeon RX550/550 Series

attachment0: 38 FPS
attachment1: 19 FPS

我们看到 DirectX 不受此问题的影响,并且 OpenGL 问题在 WebGL 中可重现。这是意料之中的结果,因为游戏玩家和开发者只抱怨 OpenGL 性能。

P.S. 可能我的问题是this 和this 性能下降的根源。

【问题讨论】:

您可以尝试使用 Radeon Graphics Profiler 来分析您的应用程序吗?另请提供详细信息,例如您使用哪个 GPU 来比较此性能结果。 @ParitoshKulkarni 添加了 GPU 信息。综上所述,我们在 HD3000、RX470、RX550 和 RX590 GPU 上重现了我的测试包的问题。 @ParitoshKulkarni 我发现问题出在非零附件填充率上,然后我编写了测试示例。他们每帧只做一个清晰的和 100 个类似的全屏矩形绘制调用(可以用一个实例化绘制调用代替)。减速成比例地取决于绘制调用的数量、颜色附件的数量和矩形片段的覆盖率。 @ParitoshKulkarni Radeon Graphics Profiler 似乎只支持 Directx12、Vulkan 和 OpenCL。我使用 RenderDoc 分析了测试应用程序,但没有发现任何问题。你能建议任何其他工具吗?提前致谢。 @ParitoshKulkarni 添加了关于 DirectX 的重要说明。请参阅帖子中的 UPD4。 【参考方案1】:

自(至少)2019 年 12 月驱动程序以来,该问题已由 AMD 修复。该修复已通过上述测试程序和我们的游戏引擎 FPS 速率得到确认。 另请参阅this 线程。

尊敬的 AMD OpenGL 驱动团队,非常感谢!

【讨论】:

以上是关于在 AMD 上写入非零 FBO 附件时,OpenGL 会降低性能的主要内容,如果未能解决你的问题,请参考以下文章

GL_TEXTURE_3D 颜色和模板 FBO 附件

FBO深度纹理附件的最佳设置

iOS OpenGL ES Analyzer 列出“不存在的帧缓冲区附件”和“缺少帧缓冲区附件”,但 FBO 工作

GLSL MRT 将相同的数据写入所有颜色附件

带有 FBO 附件的 glreadpixels

FBO 深度和模板渲染缓冲区附件