在 AMD 上写入非零 FBO 附件时,OpenGL 会降低性能
Posted
技术标签:
【中文标题】在 AMD 上写入非零 FBO 附件时,OpenGL 会降低性能【英文标题】:OpenGL drops performance when writing to nonzero FBO attachment on AMD 【发布时间】:2019-08-13 12:58:50 【问题描述】:我注意到我的 3D 引擎在 AMD 硬件上运行非常缓慢。经过一番调查,慢代码归结为创建带有多个附件的 FBO 并写入任何非零附件。在所有测试中,我将 AMD 性能与相同的 AMD GPU 进行了比较,但写入不受影响的GL_COLOR_ATTACHMENT0
,以及与我的 AMD 设备的性能差异众所周知的 Nvidia 硬件。
将片段写入非零附件比预期慢 2-3 倍。
此代码相当于我在测试应用中创建帧缓冲区和测量性能的方式:
// Create a framebuffer
static const auto attachmentCount = 6;
GLuint fb, att[attachmentCount];
glGenTextures(attachmentCount, att);
glGenFramebuffers(1, &fb);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
for (auto i = 0; i < attachmentCount; ++i)
glBindTexture(GL_TEXTURE_2D, att[i]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, att[i], 0);
GLuint dbs[] =
GL_NONE,
GL_COLOR_ATTACHMENT1,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE;
glDrawBuffers(attachmentCount, dbs);
// Main loop
while (shouldWork)
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
showFps();
有什么问题吗?
可以在here 找到完全可重复的最小测试。我尝试了许多其他书写模式或 OpenGL 状态,并在AMD Community 中描述了其中一些。
我认为问题出在 AMD 的 OpenGL 驱动程序中,但如果不是,或者您遇到了同样的问题并找到了解决方法(供应商扩展?),请分享。
UPD:在此处移动问题详情。
我准备了一个最小的测试包,其中应用程序创建了一个带有六个 RGBA UNSIGNED_BYTE 附件的 FBO,并为它呈现每帧 100 个全屏矩形。有四种可执行文件,四种写法:
正在将着色器输出 0 写入附件 0。只有输出 0 使用 glDrawBuffers
路由到帧缓冲区。所有其他输出都设置为GL_NONE
。
与 1 相同,但输出和附件为 1。
正在将输出 0 写入附件 0,但所有六个着色器输出都分别路由到附件 0..6,并且除 0 之外的所有绘制缓冲区都被 glColorMaski
屏蔽。
与 3 相同,但用于附件 1。
我在两台具有几乎相似 CPU 和以下 GPU 的机器上运行所有测试:
AMD Radeon RX550,驱动版本 19.30.01.16
Nvidia Geforce GTX 650 Ti,比 RX550 低约 2 倍
得到了这些结果:
Geforce GTX 650 Ti:
attachment0: 195 FPS
attachment1: 195 FPS
attachment0 masked: 195 FPS
attachment1 masked: 235 FPS
Radeon RX550:
attachment0: 350 FPS
attachment1: 185 FPS
attachment0 masked: 330 FPS
attachment1 masked: 175 FPS
预构建的测试可执行文件附在帖子中,或者可以从Google drive下载。
测试源代码(带有 MSVS 友好的 cmake 构建系统)可在 Github 上获得
所有四个程序都显示黑色窗口和带有 FPS 计数器的控制台。
我们看到,在写入非零附件时,AMD 比功能较弱的 nvidia GPU 和它本身慢得多。此外,drawbuffer 输出的全局屏蔽也会降低一些 fps。
我还尝试使用渲染缓冲区而不是纹理,使用其他图像格式(而测试中的格式是最兼容的格式),渲染到两倍大小的帧缓冲区。结果是一样的。
明确关闭剪刀、模板和深度测试没有帮助。
如果我通过将顶点坐标乘以小于 1 的值来减少附件数量或减少帧缓冲区覆盖率,则测试性能会按比例提高,最终 RX550 的性能优于 GTX 650 Ti。
glClear
调用也会受到影响,它们在各种条件下的表现符合上述观察结果。
我的队友使用原生 Linux 并使用 Wine 在 Radeon HD 3000 上启动了测试。两次测试运行都暴露了附件 0 和附件 1 测试之间的巨大差异。我无法确切知道他的驱动程序版本是什么,但它是由 Ubuntu 19.04 repos 提供的。
另一位队友在 Radeon RX590 上进行了测试,得到了相同的 2 倍差异。
最后,让我在这里复制粘贴两个几乎相同的测试示例。这个工作很快:
#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>
#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>
static std::string getErrorDescr(const GLenum errCode)
// English descriptions are from
// https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
switch (errCode)
case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
case GL_INVALID_VALUE: return "A numeric argument is out of range.";
case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
default:;
return "No description available.";
static std::string getErrorMessage()
const GLenum error = glGetError();
if (GL_NO_ERROR == error) return "";
std::stringstream ss;
ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
ss << "Error string: ";
ss << getErrorDescr(error);
ss << std::endl;
return ss.str();
[[maybe_unused]] static bool error()
const auto message = getErrorMessage();
if (message.length() == 0) return false;
std::cerr << message;
return true;
static bool compileShader(const GLuint shader, const std::string& source)
unsigned int linesCount = 0;
for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
const char** sourceLines = new const char*[linesCount];
int* lengths = new int[linesCount];
int idx = 0;
const char* lineStart = source.data();
int lineLength = 1;
const auto len = source.length();
for (unsigned int i = 0; i < len; ++i)
if (source[i] == '\n')
sourceLines[idx] = lineStart;
lengths[idx] = lineLength;
lineLength = 1;
lineStart = source.data() + i + 1;
++idx;
else ++lineLength;
glShaderSource(shader, linesCount, sourceLines, lengths);
glCompileShader(shader);
GLint logLength;
glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0)
auto* const log = new GLchar[logLength + 1];
glGetShaderInfoLog(shader, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
GLint compileStatus;
glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
delete[] sourceLines;
delete[] lengths;
return bool(compileStatus);
static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
const auto vs = glCreateShader(GL_VERTEX_SHADER);
if (vs == 0)
std::cerr << "Error: vertex shader is 0." << std::endl;
return 2;
const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
if (fs == 0)
std::cerr << "Error: fragment shader is 0." << std::endl;
return 2;
// Compile shaders
if (!compileShader(vs, vertSource))
std::cerr << "Error: could not compile vertex shader." << std::endl;
return 5;
if (!compileShader(fs, fragSource))
std::cerr << "Error: could not compile fragment shader." << std::endl;
return 5;
// Link program
const auto program = glCreateProgram();
if (program == 0)
std::cerr << "Error: program is 0." << std::endl;
return 2;
glAttachShader(program, vs);
glAttachShader(program, fs);
glLinkProgram(program);
// Get log
GLint logLength = 0;
glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0)
auto* const log = new GLchar[logLength + 1];
glGetProgramInfoLog(program, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
GLint linkStatus = 0;
glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
if (!linkStatus)
std::cerr << "Error: could not link." << std::endl;
return 2;
glDeleteShader(vs);
glDeleteShader(fs);
return program;
static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
gl_Position = vec4(v, 0.0, 1.0);
)";
static const std::string fragSource = R"(
#version 330
layout(location = 0) out vec4 outColor0;
void main()
outColor0 = vec4(0.5, 0.5, 0.5, 1.0);
)";
int main()
// Init
if (!glfwInit())
std::cerr << "Error: glfw init failed." << std::endl;
return 3;
static const int width = 800;
static const int height= 600;
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
GLFWwindow* window = nullptr;
window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
if (window == nullptr)
std::cerr << "Error: window is null." << std::endl;
glfwTerminate();
return 1;
glfwMakeContextCurrent(window);
if (glewInit() != GLEW_OK)
std::cerr << "Error: glew not OK." << std::endl;
glfwTerminate();
return 2;
// Shader program
const auto shaderProgram = createProgram(vertSource, fragSource);
glUseProgram(shaderProgram);
// Vertex buffer
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
GLuint buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_ARRAY_BUFFER, buffer);
float bufferData[] =
-1.0f, -1.0f,
1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, 1.0f
;
glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
// Framebuffer
GLuint fb, att[6];
glGenTextures(6, att);
glGenFramebuffers(1, &fb);
glBindTexture(GL_TEXTURE_2D, att[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[1]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[3]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[4]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[5]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);
GLuint dbs[] =
GL_COLOR_ATTACHMENT0,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE;
glDrawBuffers(6, dbs);
if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER))
std::cerr << "Error: framebuffer is incomplete." << std::endl;
return 1;
if (error())
std::cerr << "OpenGL error occured." << std::endl;
return 2;
// Fpsmeter
static const uint32_t framesMax = 50;
uint32_t framesCount = 0;
auto start = std::chrono::steady_clock::now();
// Main loop
while (!glfwWindowShouldClose(window))
if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
if (++framesCount == framesMax)
framesCount = 0;
const auto now = std::chrono::steady_clock::now();
const auto duration = now - start;
start = now;
const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
// Shutdown
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(vao);
glUseProgram(0);
glDeleteProgram(shaderProgram);
glDeleteBuffers(1, &buffer);
glDeleteVertexArrays(1, &vao);
glDeleteFramebuffers(1, &fb);
glDeleteTextures(6, att);
glfwMakeContextCurrent(nullptr);
glfwDestroyWindow(window);
glfwTerminate();
return 0;
这个例子在 Nvidia 和 Intel GPU 上的运行速度相当快,但比在 AMD GPU 上的第一个例子慢 2-3 倍:
#include <iostream>
#include <cassert>
#include <string>
#include <sstream>
#include <chrono>
#include "GL/glew.h"
#include "GLFW/glfw3.h"
#include <vector>
static std::string getErrorDescr(const GLenum errCode)
// English descriptions are from
// https://www.opengl.org/sdk/docs/man/docbook4/xhtml/glGetError.xml
switch (errCode)
case GL_NO_ERROR: return "No error has been recorded. THIS message is the error itself.";
case GL_INVALID_ENUM: return "An unacceptable value is specified for an enumerated argument.";
case GL_INVALID_VALUE: return "A numeric argument is out of range.";
case GL_INVALID_OPERATION: return "The specified operation is not allowed in the current state.";
case GL_INVALID_FRAMEBUFFER_OPERATION: return "The framebuffer object is not complete.";
case GL_OUT_OF_MEMORY: return "There is not enough memory left to execute the command.";
case GL_STACK_UNDERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to underflow.";
case GL_STACK_OVERFLOW: return "An attempt has been made to perform an operation that would cause an internal stack to overflow.";
default:;
return "No description available.";
static std::string getErrorMessage()
const GLenum error = glGetError();
if (GL_NO_ERROR == error) return "";
std::stringstream ss;
ss << "OpenGL error: " << static_cast<int>(error) << std::endl;
ss << "Error string: ";
ss << getErrorDescr(error);
ss << std::endl;
return ss.str();
[[maybe_unused]] static bool error()
const auto message = getErrorMessage();
if (message.length() == 0) return false;
std::cerr << message;
return true;
static bool compileShader(const GLuint shader, const std::string& source)
unsigned int linesCount = 0;
for (const auto c: source) linesCount += static_cast<unsigned int>(c == '\n');
const char** sourceLines = new const char*[linesCount];
int* lengths = new int[linesCount];
int idx = 0;
const char* lineStart = source.data();
int lineLength = 1;
const auto len = source.length();
for (unsigned int i = 0; i < len; ++i)
if (source[i] == '\n')
sourceLines[idx] = lineStart;
lengths[idx] = lineLength;
lineLength = 1;
lineStart = source.data() + i + 1;
++idx;
else ++lineLength;
glShaderSource(shader, linesCount, sourceLines, lengths);
glCompileShader(shader);
GLint logLength;
glGetShaderiv(shader, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0)
auto* const log = new GLchar[logLength + 1];
glGetShaderInfoLog(shader, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
GLint compileStatus;
glGetShaderiv(shader, GL_COMPILE_STATUS, &compileStatus);
delete[] sourceLines;
delete[] lengths;
return bool(compileStatus);
static GLuint createProgram(const std::string& vertSource, const std::string& fragSource)
const auto vs = glCreateShader(GL_VERTEX_SHADER);
if (vs == 0)
std::cerr << "Error: vertex shader is 0." << std::endl;
return 2;
const auto fs = glCreateShader(GL_FRAGMENT_SHADER);
if (fs == 0)
std::cerr << "Error: fragment shader is 0." << std::endl;
return 2;
// Compile shaders
if (!compileShader(vs, vertSource))
std::cerr << "Error: could not compile vertex shader." << std::endl;
return 5;
if (!compileShader(fs, fragSource))
std::cerr << "Error: could not compile fragment shader." << std::endl;
return 5;
// Link program
const auto program = glCreateProgram();
if (program == 0)
std::cerr << "Error: program is 0." << std::endl;
return 2;
glAttachShader(program, vs);
glAttachShader(program, fs);
glLinkProgram(program);
// Get log
GLint logLength = 0;
glGetProgramiv(program, GL_INFO_LOG_LENGTH, &logLength);
if (logLength > 0)
auto* const log = new GLchar[logLength + 1];
glGetProgramInfoLog(program, logLength, nullptr, log);
std::cout << "Log: " << std::endl;
std::cout << log;
delete[] log;
GLint linkStatus = 0;
glGetProgramiv(program, GL_LINK_STATUS, &linkStatus);
if (!linkStatus)
std::cerr << "Error: could not link." << std::endl;
return 2;
glDeleteShader(vs);
glDeleteShader(fs);
return program;
static const std::string vertSource = R"(
#version 330
layout(location = 0) in vec2 v;
void main()
gl_Position = vec4(v, 0.0, 1.0);
)";
static const std::string fragSource = R"(
#version 330
layout(location = 1) out vec4 outColor1;
void main()
outColor1 = vec4(0.5, 0.5, 0.5, 1.0);
)";
int main()
// Init
if (!glfwInit())
std::cerr << "Error: glfw init failed." << std::endl;
return 3;
static const int width = 800;
static const int height= 600;
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3);
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
GLFWwindow* window = nullptr;
window = glfwCreateWindow(width, height, "Shader test", nullptr, nullptr);
if (window == nullptr)
std::cerr << "Error: window is null." << std::endl;
glfwTerminate();
return 1;
glfwMakeContextCurrent(window);
if (glewInit() != GLEW_OK)
std::cerr << "Error: glew not OK." << std::endl;
glfwTerminate();
return 2;
// Shader program
const auto shaderProgram = createProgram(vertSource, fragSource);
glUseProgram(shaderProgram);
// Vertex buffer
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
GLuint buffer;
glGenBuffers(1, &buffer);
glBindBuffer(GL_ARRAY_BUFFER, buffer);
float bufferData[] =
-1.0f, -1.0f,
1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, 1.0f,
-1.0f, 1.0f
;
glBufferData(GL_ARRAY_BUFFER, std::size(bufferData) * sizeof(float), bufferData, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, (GLvoid*)(0));
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
// Framebuffer
GLuint fb, att[6];
glGenTextures(6, att);
glGenFramebuffers(1, &fb);
glBindTexture(GL_TEXTURE_2D, att[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[1]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[2]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[3]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[4]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, att[5]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, nullptr);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fb);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, att[0], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, att[1], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, att[2], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, att[3], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT4, GL_TEXTURE_2D, att[4], 0);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT5, GL_TEXTURE_2D, att[5], 0);
GLuint dbs[] =
GL_NONE,
GL_COLOR_ATTACHMENT1,
GL_NONE,
GL_NONE,
GL_NONE,
GL_NONE;
glDrawBuffers(6, dbs);
if (GL_FRAMEBUFFER_COMPLETE != glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER))
std::cerr << "Error: framebuffer is incomplete." << std::endl;
return 1;
if (error())
std::cerr << "OpenGL error occured." << std::endl;
return 2;
// Fpsmeter
static const uint32_t framesMax = 50;
uint32_t framesCount = 0;
auto start = std::chrono::steady_clock::now();
// Main loop
while (!glfwWindowShouldClose(window))
if (glfwGetKey(window, GLFW_KEY_ESCAPE) == GLFW_PRESS) glfwSetWindowShouldClose(window, GLFW_TRUE);
glClear(GL_COLOR_BUFFER_BIT);
for (int i = 0; i < 100; ++i) glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
if (++framesCount == framesMax)
framesCount = 0;
const auto now = std::chrono::steady_clock::now();
const auto duration = now - start;
start = now;
const float secsPerFrame = (std::chrono::duration_cast<std::chrono::microseconds>(duration).count() / 1000000.0f) / framesMax;
std::cout << "FPS: " << 1.0f / secsPerFrame << std::endl;
// Shutdown
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(vao);
glUseProgram(0);
glDeleteProgram(shaderProgram);
glDeleteBuffers(1, &buffer);
glDeleteVertexArrays(1, &vao);
glDeleteFramebuffers(1, &fb);
glDeleteTextures(6, att);
glfwMakeContextCurrent(nullptr);
glfwDestroyWindow(window);
glfwTerminate();
return 0;
这些示例之间的唯一区别是使用的颜色附件。
我故意编写了两个几乎相似的复制粘贴程序,以避免帧缓冲区删除和重新创建可能产生的不良影响。
UPD2:还在我在 Nvidia 和 AMD 上的测试示例中尝试了 OpenGL 4.6 调试上下文。没有性能警告。
UPD3: RX470 结果:
attachment0: 775 FPS
attachment1: 396 FPS
UPD4:我通过 emscripten 为 webgl 构建了 attachment0 和 attachment1 测试,并在 Radeon RX550 上运行它们。完整源代码在问题的Github repo,构建命令行是
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment0_webgl.cpp -o attachment0.html
emcc --std=c++17 -O3 -s WASM=1 -s USE_GLFW=3 -s USE_WEBGL2=1 ./FillRate_attachment1_webgl.cpp -o attachment1.html
两个测试程序都发出一个drawcall:glDrawArraysInstanced(GL_TRIANGLES, 0, 6, 1000);
第一个测试:默认配置的 Firefox,即支持 DirectX 的 ANGLE。
Unmasked Vendor: Google Inc.
Unmasked Renderer: ANGLE (Radeon RX550/550 Series Direct3D11 vs_5_0 ps_5_0)
attachment0: 38 FPS
attachment1: 38 FPS
第二次测试:禁用 ANGLE 的 Firefox (about:config
-> webgl.disable-angle = true
),使用原生 OpenGL:
Unmasked Vendor: ATI Technologies Inc.
Unmasked Renderer: Radeon RX550/550 Series
attachment0: 38 FPS
attachment1: 19 FPS
我们看到 DirectX 不受此问题的影响,并且 OpenGL 问题在 WebGL 中可重现。这是意料之中的结果,因为游戏玩家和开发者只抱怨 OpenGL 性能。
P.S. 可能我的问题是this 和this 性能下降的根源。
【问题讨论】:
您可以尝试使用 Radeon Graphics Profiler 来分析您的应用程序吗?另请提供详细信息,例如您使用哪个 GPU 来比较此性能结果。 @ParitoshKulkarni 添加了 GPU 信息。综上所述,我们在 HD3000、RX470、RX550 和 RX590 GPU 上重现了我的测试包的问题。 @ParitoshKulkarni 我发现问题出在非零附件填充率上,然后我编写了测试示例。他们每帧只做一个清晰的和 100 个类似的全屏矩形绘制调用(可以用一个实例化绘制调用代替)。减速成比例地取决于绘制调用的数量、颜色附件的数量和矩形片段的覆盖率。 @ParitoshKulkarni Radeon Graphics Profiler 似乎只支持 Directx12、Vulkan 和 OpenCL。我使用 RenderDoc 分析了测试应用程序,但没有发现任何问题。你能建议任何其他工具吗?提前致谢。 @ParitoshKulkarni 添加了关于 DirectX 的重要说明。请参阅帖子中的 UPD4。 【参考方案1】:自(至少)2019 年 12 月驱动程序以来,该问题已由 AMD 修复。该修复已通过上述测试程序和我们的游戏引擎 FPS 速率得到确认。 另请参阅this 线程。
尊敬的 AMD OpenGL 驱动团队,非常感谢!
【讨论】:
以上是关于在 AMD 上写入非零 FBO 附件时,OpenGL 会降低性能的主要内容,如果未能解决你的问题,请参考以下文章