GDI+ DrawImage 在 C++ (Win32) 中比在 C# (WinForms) 中慢得多

Posted 2023-03-29

技术标签:

【中文标题】GDI+ DrawImage 在 C++ (Win32) 中比在 C# (WinForms) 中慢得多【英文标题】：GDI+ DrawImage notably slower in C++ (Win32) than in C# (WinForms) 【发布时间】：2020-06-14 06:05:59 【问题描述】：

我正在将一个应用程序从 C# (WinForms) 移植到 C++，并注意到使用 GDI+ 绘制图像在 C++ 中要慢得多，即使它使用相同的 API。

图像在应用程序启动时分别加载到System.Drawing.Image 或Gdiplus::Image。

C#绘图代码为（直接在主窗体中）：

public Form1()

    this.SetStyle(ControlStyles.UserPaint | ControlStyles.AllPaintingInWmPaint | ControlStyles.OptimizedDoubleBuffer, true);
    this.image = Image.FromFile(...);


private readonly Image image;

protected override void OnPaint(PaintEventArgs e)

    base.OnPaint(e);
    var sw = Stopwatch.StartNew();
    e.Graphics.TranslateTransform(this.translation.X, this.translation.Y); /* NOTE0 */
    e.Graphics.DrawImage(this.image, 0, 0, this.image.Width, this.image.Height);
    Debug.WriteLine(sw.Elapsed.TotalMilliseconds.ToString()); // ~3ms

关于 SetStyle：AFAIK，这些标志 (1) 使 WndProc 忽略 WM_ERASEBKGND，以及 (2) 为双缓冲绘图分配临时的 HDC 和 Graphics。

C++ 绘图代码比较臃肿。我浏览了 System.Windows.Forms.Control 的参考源，了解它如何处理 HDC 以及它如何实现双缓冲。

据我所知，我的实现非常匹配（请参阅 NOTE1）（请注意，我首先在 C++ 中实现了它，然后查看了它在 .NET 源代码中的表现——我可能忽略了一些事情）。该程序的其余部分或多或少是您在 VS2019 中创建新的 Win32 项目时获得的。为便于阅读，省略了所有错误处理。

// In wWinMain:
    Gdiplus::GdiplusStartupInput gdiplusStartupInput;
    Gdiplus::GdiplusStartup(&gdiplusToken, &gdiplusStartupInput, NULL);
    gdip_bitmap = Gdiplus::Image::FromFile(...);

// In the WndProc callback:
case WM_PAINT:
    // Need this for the back buffer bitmap
    RECT client_rect;
    GetClientRect(hWnd, &client_rect);
    int client_width = client_rect.right - client_rect.left;
    int client_height = client_rect.bottom - client_rect.top;

    // Double buffering
    HDC hdc0 = BeginPaint(hWnd, &ps);
    HDC hdc = CreateCompatibleDC(hdc0);
    HBITMAP back_buffer = CreateCompatibleBitmap(hdc0, client_width, client_height); /* NOTE1 */
    HBITMAP dummy_buffer = (HBITMAP)SelectObject(hdc, back_buffer);

    // Create GDI+ stuff on top of HDC
    Gdiplus::Graphics *graphics = Gdiplus::Graphics::FromHDC(hdc);

    QueryPerformanceCounter(...);
    graphics->DrawImage(gdip_bitmap, 0, 0, bitmap_width, bitmap_height);
    /* print performance counter diff */ // -> ~27 ms typically

    delete graphics;

    // Double buffering
    BitBlt(hdc0, 0, 0, client_width, client_height, hdc, 0, 0, SRCCOPY);
    SelectObject(hdc, dummy_buffer);
    DeleteObject(back_buffer);
    DeleteDC(hdc); // This is the temporary double buffer HDC

    EndPaint(hWnd, &ps);

/* NOTE1 */：在.NET 源代码中，他们不使用CreateCompatibleBitmap，而是使用CreateDIBSection。这将性能从 27 毫秒提高到 21 毫秒，而且非常麻烦（见下文）。

在这两种情况下，当鼠标移动时（OnMouseMove、WM_MOUSEMOVE），我分别调用Control.Invalidate 或InvalidateRect。我们的目标是使用 SetTransform 实现鼠标平移 - 只要绘制性能不好，现在就无关紧要了。

注意2：https://***.com/a/1617930/653473

这个答案表明使用Gdiplus::CachedBitmap 是诀窍。但是，我在 C# WinForms 源代码中找不到任何证据表明它以任何方式使用缓存位图 - C# 代码使用映射到 GdipDrawImageRectI 的 GdipDrawImageRectI，映射到 Graphics::DrawImage(IN Image* image, IN INT x, IN INT y, IN INT width, IN INT height)。

关于/* NOTE1 */，这里是CreateCompatibleBitmap的替换（直接替换CreateVeryCompatibleBitmap）：

bool bFillBitmapInfo(HDC hdc, BITMAPINFO *pbmi)

    HBITMAP hbm = NULL;
    bool bRet = false;

    // Create a dummy bitmap from which we can query color format info about the device surface.
    hbm = CreateCompatibleBitmap(hdc, 1, 1);

    pbmi->bmiHeader.biSize = sizeof(BITMAPINFOHEADER);

    // Call first time to fill in BITMAPINFO header.
    GetDIBits(hdc, hbm, 0, 0, NULL, pbmi, DIB_RGB_COLORS);

    if ( pbmi->bmiHeader.biBitCount <= 8 ) 
        // UNSUPPORTED
     else 
        if ( pbmi->bmiHeader.biCompression == BI_BITFIELDS ) 
            // Call a second time to get the color masks.
            // It's a GetDIBits Win32 "feature".
            GetDIBits(hdc, hbm, 0, pbmi->bmiHeader.biHeight, NULL, pbmi, DIB_RGB_COLORS);
        
        bRet = true;
    

    if (hbm != NULL) 
        DeleteObject(hbm);
        hbm = NULL;
    
    return bRet;


HBITMAP CreateVeryCompatibleBitmap(HDC hdc, int width, int height)

    BITMAPINFO *pbmi = (BITMAPINFO *)LocalAlloc(LMEM_ZEROINIT, 4096); // Because otherwise I would have to figure out the actual size of the color table at the end; whatever...
    bFillBitmapInfo(hdc, pbmi);
    pbmi->bmiHeader.biWidth = width;
    pbmi->bmiHeader.biHeight = height;
    if (pbmi->bmiHeader.biCompression == BI_RGB) 
            pbmi->bmiHeader.biSizeImage = 0;
     else 
        if ( pbmi->bmiHeader.biBitCount == 16 )
            pbmi->bmiHeader.biSizeImage = width * height * 2;
        else if ( pbmi->bmiHeader.biBitCount == 32 )
            pbmi->bmiHeader.biSizeImage = width * height * 4;
        else
            pbmi->bmiHeader.biSizeImage = 0;
    
    pbmi->bmiHeader.biClrUsed = 0;
    pbmi->bmiHeader.biClrImportant = 0;

    void *dummy;
    HBITMAP back_buffer = CreateDIBSection(hdc, pbmi, DIB_RGB_COLORS, &dummy, NULL, 0);
    LocalFree(pbmi);
    return back_buffer;

使用非常兼容的位图作为后台缓冲区可将性能从 27 毫秒提高到 21 毫秒。

关于 C# 代码中的/* NOTE0 */ -- 如果变换矩阵不缩放，代码只有快。 C# 性能在升级时略有下降（~9 毫秒），而在降采样时则显着下降（~22 毫秒）。

这暗示：DrawImage 如果可能，可能想要 BitBlt。但它不能在我的 C++ 案例中，因为Bitmap 格式（从磁盘加载）与后台缓冲区格式或其他格式不同。如果我创建一个新的 more compatible 位图（这次CreateCompatibleBitmap 和 CreateVeryCompatibleBitmap 之间没有明显区别），然后在其上绘制原始位图，然后只使用 more compatible 位图在DrawImage 调用中，然后性能增加到大约 4.5 毫秒。现在缩放时，它还具有与 C# 代码相同的性能特征。

if (better_bitmap == NULL)

    HBITMAP tmp_bitmap = CreateVeryCompatibleBitmap(hdc0, gdip_bitmap->GetWidth(), gdip_bitmap->GetHeight());
    HDC copy_hdc = CreateCompatibleDC(hdc0);
    HGDIOBJ old = SelectObject(copy_hdc, tmp_bitmap);
    Gdiplus::Graphics *copy_graphics = Gdiplus::Graphics::FromHDC(copy_hdc);
    copy_graphics->DrawImage(gdip_bitmap, 0, 0, gdip_bitmap->GetWidth(), gdip_bitmap->GetHeight());
    // Now tmp_bitmap contains the image, hopefully in the device's preferred format
    delete copy_graphics;
    SelectObject(copy_hdc, old);
    DeleteDC(copy_hdc);
    better_bitmap = Gdiplus::Bitmap::FromHBITMAP(tmp_bitmap, NULL);

但它仍然始终较慢，一定还有一些东西丢失了。它提出了一个新问题：为什么在 C#（相同的图像和相同的机器）中这个 not 是必需的？据我所知，Image.FromFile不在加载时转换位图格式。

为什么 C++ 代码中的 DrawImage 调用仍然较慢，我需要做什么才能使其与 C# 中一样快？

【问题讨论】：

您是否正在为c++ 中的发布版本计时？ @drescherjm 是的，在 Release 中使用 /O2 记录了 C++ 时间。你正在计时一次 DrawImage 调用吗？ @TaW 每次测量调用一次DrawImage，多次测量。忽略前几个电话。如果您暗示测量不准确 - 否：差异很小。 【参考方案1】：

我最终复制了更多疯狂的 .NET 代码。

使它快速运行的神奇调用是GdipImageForceValidation in System.Drawing.Image.FromFile。这个函数基本上没有文档记录，甚至 [官方] 不能从 C++ 调用。这里只提一下：https://docs.microsoft.com/en-us/windows/win32/gdiplus/-gdiplus-image-flat

Gdiplus::Image::FromFile 和 GdipLoadImageFromFile 实际上并没有将完整的图像加载到内存中。每次绘制它时，它都会有效地从磁盘复制。 GdipImageForceValidation 强制将图像加载到内存中，或者看起来......

我最初将图像复制到更兼容的位图中的想法是正确的，但我这样做的方式并没有为 GDI+ 产生最佳性能（因为我使用了来自原始 HDC 的 GDI 位图）。将图像直接加载到新的 GDI+ 位图中，无论像素格式如何，都会产生与 C# 实现相同的性能特征：

better_bitmap = new Gdiplus::Bitmap(gdip_bitmap->GetWidth(), gdip_bitmap->GetHeight(), PixelFormat24bppRGB);
Gdiplus::Graphics *graphics = Gdiplus::Graphics::FromImage(better_bitmap);
graphics->DrawImage(gdip_bitmap, 0, 0, gdip_bitmap->GetWidth(), gdip_bitmap->GetHeight());
delete graphics;

更好的是，使用PixelFormat32bppPARGB 进一步显着提高了性能 - 重复绘制图像时，预乘 alpha 得到了回报（无论源图像是否具有 alpha 通道）。

似乎调用GdipImageForceValidation 在内部有效地做了类似的事情，虽然我不知道它到底做了什么。因为 Microsoft 使从 C++ 用户代码调用 GDI+ 平面 API 变得尽可能不可能，所以我只是在我的 Windows SDK 标头中修改了Gdiplus::Image 以包含适当的方法。将位图显式复制到 PARGB 对我来说似乎更干净（并产生更好的性能）。

当然，在找出要使用哪个未记录的函数之后，google 也会提供一些额外的信息：https://photosauce.net/blog/post/image-scaling-with-gdi-part-5-push-vs-pull-and-image-validation

GDI+ 不是我最喜欢的 API。

【讨论】：

以上是关于GDI+ DrawImage 在 C++ (Win32) 中比在 C# (WinForms) 中慢得多的主要内容，如果未能解决你的问题，请参考以下文章