YUV 文件读取显示缩放裁剪等操作教程

Posted 2023-02-03 芥末的无奈

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了YUV 文件读取显示缩放裁剪等操作教程相关的知识，希望对你有一定的参考价值。

系列文章目录

文章目录

系列文章目录
前言
一、Chroma subsampling
二、读取 YUV 文件
三、SDL 显示 YUV 文件
- 3.1 SDL Texture 支持的 YUV 格式
- 3.2 显示 YUV 图片
四、使用 libyuv 对 YUV 图片进行操作
- 4.1 缩放
- 4.2 裁剪、旋转
总结

前言

关于 YUV 文件的说明，网络上已经有非常多很好的文章了，我就不再班门弄斧了，列出一些优质文章供大家参考：

本文代码你可以在 simple_yuv_viewer 项目中找到，simple_yuv_viewer 是一个基于 Dear ImGUI 和 SDL 的 YUV 文件显示工具，向你展示了如何去读取 YUV 文件，如何使用 SDL 显示它，以及如何使用 libyuv 来对 YUV 进行缩放、裁剪等操作。

一、Chroma subsampling

在学习 YUV 格式时，最让我困惑是 422、444、420 这些数字到底是什么含义。虽然很多文章都有对这些数字做解释，但总觉得没有解释清楚核心含义。直到我找到了这个 wiki - Chroma subsampling
，接下来我总结下这个 wiki 的内容。

数字信号通常被压缩以减小文件大小并节省传输时间。由于人类视觉系统对亮度的变化比对颜色的变化敏感得多，因此可以通过将更多带宽分配给亮度分量（通常表示为 Y’）而不是色差分量Cb和Cr来优化视频系统。例如，在压缩图像中，4:2:2 Y’CbCr方案需要非子采样“4:4:4” R’G’B’带宽的三分之二。这种减少导致观看者几乎没有视觉差异。

在视频编码系统中为了降低带宽，可以保存更多的亮度信息(luma)，保存较少的色差信息(chroma)。这叫做 chrominance subsamping, 色度二次采样。

subsampling 方案通常表示为三部分比例J : a : b（例如 4:2:2）或四部分，如果存在 alpha 通道（例如 4:2:2:4），则描述J像素宽和 2 像素高的概念区域中的亮度和色度样本。这些部分是（按各自的顺序）：

J：水平采样参考（概念区域的宽度）。通常是 4。
a ：第一行 J 像素中的色度样本数（Cr，Cb ）。
b ：第一行和第二行 J 像素之间色度样本（Cr，Cb ）的变化次数。请注意，b必须为零或等于a（除了罕见的不规则情况，如 4:4:1 和 4:2:1，它们不遵循此约定）。
Alpha：水平因子（相对于第一位数字）。如果 alpha 分量不存在，则可以省略，存在时等于J。

上面这张图就很好的解释了 YUV 格式中各种数字的含义，通过理解这张图你可以轻松的计算每种 YUV 格式文件大小。以上图中 8 个像素（ 4 x 2）的图片为例，其不同格式的文件大小为：

格式	Y 个数	Cr 个数	Cb 个数	大小(byte)	压缩率
4:4:4	8	8	8	24	1/1
4:4:0	8	4	4	16	2/3
4:2:2	8	4	4	16	2/3
4:2:0	8	2	2	12	1/2
4:1:1	8	2	2	12	1/2

理解了 subsampling 后，你也就能理解为什么我们会说 YUV420 格式图片文件的大小是 RGB 文件大小的 1/2 了。

二、读取 YUV 文件

YUV 的存储方式，一般有两种方式，一种叫 packed 模式，一种叫 planar 模式。packed 模式 Y，U，V 交错排列，而 planar 模式 Y 和 U，V 的排列是分开的，而具体 U 与 V 继续分开或者继续交错排列根据具体的格式相关。具体的 YUV 格式是哪种模式以及 YUV 分量是如何排列的，请参考音视频入门-07-认识YUV。

读取 YUV 文件的重点在于我们要计算好 Y/U/V 各分量的内存位置，以及确定各分量的 pitch 宽度（YUV格式解释，步长（间距）解释）。得到分量内存位置和 pitch ，是使用 libyuv 和显示 YUV 文件的前提。

2.1 准备工作

准备一张图片，例如 libyuv-rainbow-700x700.bmp ，该图片分辨率为 700x700。
使用 ffmpeg 将该图片转换为需要测试用的 yuv 格式，例如转换为 yuv420p 格式

ffmpeg -i libyuv-rainbow-700x700.bmp -video_size 700x700 -pix_fmt yuv420p rainbow-yuv420p.yuv

ffmpeg 支持很多种 yuv 格式的转换，可以使用下面的命令来查看支持的格式

ffmpeg -pix_fmts

定义一个简单的读取文件的函数

std::vector<uint8_t> loadFile(const std::string& file_path)

    std::ifstream in(file_path, std::ios::in | std::ios::binary);
    std::vector<uint8_t> file_data;
    if (in) 
        in.seekg(0, std::ios::end);
        file_data.resize(in.tellg());
        in.seekg(0, std::ios::beg);
        in.read((char *) (file_data.data()), file_data.size());
        in.close();
    
    return file_data;

2.2 Planar 模式

2.2.1 YUV420P 格式读取

void loadYUV420P(const std::string& file_path,
                 int width, int height)

    auto file_data = loadFile(file_path);
    auto* yuv_data = file_data.data();
    
    auto* y_plane = yuv_data;
    size_t y_stride = width;
    
    auto* u_plane = yuv_data + (width * height);
    size_t u_stride = width/2;
    
    auto* v_plane = u_plane + (width * height)/4;
    size_t v_stride = width/2;

	// operations on yuv plane
	// ....

YUV420P 是 Planar 模式，因此它先存放 Y 分量，接着存放 U，最后存放 V。假设现在有 4x2 带下的 YUV420P 图片，那么有 8 个 Y，2 个 U 和 2 个 V，该图片文件中存放的顺序是：
```
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 U0 U1 V0 V1
```
存放 Y 分量的内存位置的偏移量是 0；Y 分量的 pitch（也叫 stride）是图片宽度（假设是 4）。pitch 表明了显示图片中的一行像素，需要 4 个 Y 分量。
存放 U 分量的内存位置的偏移量是 width * height，因为前面 width * height 个数据都是 Y；U 分量的 pitch 是 widht/2，假设 width = 4，表明了显示图片中的一行像素，需要 2 个 U 分量。
存放 V 分量的内存位置的偏移量是 width*height + (width*height/4)，因为前面 width * height 个数据是 Y，然后是 width*height/4 个 U 分量，当然我们也可以通过 U 分量的内存位置加上 width*height/4 得到 V 分量的内存位置。V 分量的 pitch 是 widht/2，假设 width = 4，表明了显示图片中的一行像素，需要 2 个 V 分量。

2.2.2 YUV422P 格式读取

void loadYUV422P(const std::string& file_path,
                 int width, int height)

    auto file_data = loadFile(file_path);
    auto* yuv_data = file_data.data();

    auto* y_plane = yuv_data;
    size_t y_stride = width;

    auto* u_plane = yuv_data + (width * height);
    size_t u_stride = width/2;

    auto* v_plane = u_plane + (width * height)/2;
    size_t v_stride = width/2;
    
	// operations on yuv plane
	// ....

YUV422P 是 Planar 模式，因此它先存放 Y 分量，接着存放 U，最后存放 V。假设现在有 4x2 带下的 YUV420P 图片，那么有 8 个 Y，4 个 U 和 4 个 V，该图片文件中存放的顺序是：
```
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 U0 U1 U2 U3 V0 V1 V2 V3
```
存放 Y 分量的内存位置的偏移量是 0；Y 分量的 pitch 是图片宽度（假设是 4）。pitch 表明了显示图片中的一行像素，需要 4 个 Y 分量。
存放 U 分量的内存位置的偏移量是 width * height，因为前面 width * height 个数据都是 Y；U 分量的 pitch 是 widht/2，假设 width = 4，表明了显示图片中的一行像素，需要 2 个 U 分量。
存放 V 分量的内存位置的偏移量是 width*height + (width*height/2)，因为前面 width * height 个数据是 Y，然后是 width*height/2 个 U 分量，当然我们也可以通过 U 分量的内存位置加上 width*height/2 得到 V 分量的内存位置。V 分量的 pitch 是 widht/2，假设 width = 4，表明了显示图片中的一行像素，需要 2 个 V 分量。

2.2.3 NV21 格式读取

NV21 是 android 中有的模式，它的存储顺序是先存 Y 分量，在 VU 交替存储。

void loadNV21(const std::string& file_path,
                 int width, int height)

    auto file_data = loadFile(file_path);
    auto* yuv_data = file_data.data();

    auto* y_plane = yuv_data;
    size_t y_stride = width;

    auto* vu_plane = yuv_data + (width * height);
    size_t vu_stride = width/2;
    
    // operations on yuv plane
    // ....

NV21 是 Planar 模式，它属于 YUV420SP 类型，它先存放 Y 分量，接着 VU 交替存储。假设现在有 4x2 带下的 NV21 图片，那么有 8 个 Y，2 个 U 和 2 个 V，该图片文件中存放的顺序是：
```
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 V0 U0 V1 U1
```
存放 Y 分量的内存位置的偏移量是 0；Y 分量的 pitch 是图片宽度（假设是 4）。pitch 表明了显示图片中的一行像素，需要 4 个 Y 分量。
存放 VU 分量的内存位置的偏移量是 width * height，因为前面 width * height 个数据都是 Y；VU 分量的 pitch 是 widht，假设 width = 4，表明了显示图片中的一行像素，需要 4 个 VU 分量。

2.2.4 NV12 格式读取

NV12 是 iOS 中有的模式，它的存储顺序是先存 Y 分量，再 UV 进行交替存储。

void loadNV12(const std::string& file_path,
              int width, int height)

    auto file_data = loadFile(file_path);
    auto* yuv_data = file_data.data();

    auto* y_plane = yuv_data;
    size_t y_stride = width;

    auto* uv_plane = yuv_data + (width * height);
    size_t uv_stride = width/2;

    // operations on yuv plane
    // ....

可以看到 loadNV21 和 loadNV12 的代码几乎是一致的。

NV12 是 Planar 模式，它属于 YUV420SP 类型，它先存放 Y 分量，接着 UV 交替存储。假设现在有 4x2 带下的 NV21 图片，那么有 8 个 Y，2 个 U 和 2 个 V，该图片文件中存放的顺序是：
```
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 U0 V0 U1 V1
```
存放 Y 分量的内存位置的偏移量是 0；Y 分量的 pitch 是图片宽度（假设是 4）。pitch 表明了显示图片中的一行像素，需要 4 个 Y 分量。
存放 UV 分量的内存位置的偏移量是 width * height，因为前面 width * height 个数据都是 Y；UV 分量的 pitch 是 widht，假设 width = 4，表明了显示图片中的一行像素，需要 4 个 UV 分量。

2.2.5 小结

导入 Planar 模式的 YUV 文件时，可以通过计算各个分量的偏移量位置来确定分量的内存存放位置。
方便起见，我们可以假设输入图片的大小是 4x2，接着写出 Y/U/V 的个数以及排列顺序，从而推断出偏移量与 width/height 之间的关系
所谓 pitch，表示的是要显示图片中的一行像素需要多少个分量。同样的，我们也可以通过 4x2 大小图片进行推理，得到 pitch。

2.3 Packed 模式

对于 Packed 模式的图片，在使用 libyuv 进行操作时反而是最简单的。Packed 模式下，YUV 分量是交织排列的，不同的格式 YUV 顺序不同。

2.3.1 YUYV 格式读取

YUYV，是基于 YUV422 的采样格式，对于一张 4x2 的 YUYV 图片而言，它有 8 个 Y 分量，4 个 U 分量和 4 个 V 分量。排列顺序为：

Y0 U0 Y1 V0 Y2 U1 Y3 V1 Y4 U2 Y5 V2 Y6 U3 Y7 V3

void loadYUYV(const std::string& file_path,
             int width, int height)

    auto file_data = loadFile(file_path);
    auto* yuv_data = file_data.data();
    size_t stride = width * 2;
    
    // operations on yuv plane
    // ....

YUYV 格式的 pitch 是 width* 2，假设 width = 4，表明了显示图片中的一行像素，一共需要 8 个 Y/U/V 分量。

2.3.2 小结

导入 Packed 模式的 YUV 文件时，无需计算 YUV 分量的位置偏移。解交织的操作通常由第三方库或者系统来完成。
Packed 模式中，pitch 表明显示图片中的一行像素需要多少个 YUV 分量，我们也可以通过 4x2 大小图片进行推理，得到 pitch。例如在 YUYV 格式下，实现一行像素，共需要 8 个分量，因此 pitch = 8。

三、SDL 显示 YUV 文件

关于 SDL 的使用请参考：

想要在 SDL 中显示 YUV 文件，步骤如下：

使用 SDL_CreateTexture 创建一个 YUV 格式的 Texture
导入 YUV 文件，并获取各个分量的内存地址
输入各分量内存地址，使用 SDL_UpdateTexture 将 YUV 数据更新至 Texture 中
使用 SDL Render 渲染图片

3.1 SDL Texture 支持的 YUV 格式

在 SDL 支持多种 YUV 格式，下面的表格做了一个整理和分类，以及与 FFMPEG 中 YUV 格式的对应

SDL YUV Format	Mode	Subsampling	FFMPEG Format(ffmpeg -pix_fmts)
SDL_PIXELFORMAT_YV12	Planar	420P	不支持，具体参考Does ffmpeg support yv12?
SDL_PIXELFORMAT_IYUV	Planar	420P	yuv420p
SDL_PIXELFORMAT_YUY2	Packed	422	yuyv422
SDL_PIXELFORMAT_UYVY	Packed	422	uyvy422
SDL_PIXELFORMAT_YVYU	Packed	422	yvyu422
SDL_PIXELFORMAT_NV12	Planar	420SP	nv12
SDL_PIXELFORMAT_NV21	Planar	420SP	nv21

那么对于不支持的 YUV 格式，SDL 要如何显示它们呢？一种简单的方法就是利用 libyuv 将格式转换为 yuv420 或者 RGB 格式。

3.2 显示 YUV 图片

直接上代码


#if defined(__cplusplus)
extern "C" 
#endif

#include <SDL.h>

#if defined(__cplusplus)
;
#endif

#include <fstream>
#include <iostream>
#include <vector>
using namespace std;

std::vector<uint8_t> loadFile(const std::string &file_path) 
	// ...


SDL_Texture *loadYUV420PTexture(const std::string &file_path,
                                SDL_Renderer *renderer,
                                int width, int height) 
    auto file_data = loadFile(file_path);
    auto *yuv_data = file_data.data();

    auto *y_plane = yuv_data;
    size_t y_stride = width;

    auto *u_plane = yuv_data + (width * height);
    size_t u_stride = width / 2;

    auto *v_plane = u_plane + (width * height) / 4;
    size_t v_stride = width / 2;

    SDL_Texture *texture = SDL_CreateTexture(renderer,
                                             SDL_PIXELFORMAT_IYUV,
                                             SDL_TEXTUREACCESS_STATIC,
                                             width,
                                             height);

    SDL_UpdateYUVTexture(texture,
                         nullptr,
                         y_plane, y_stride,
                         u_plane, u_stride,
                         v_plane, v_stride);

    return texture;


int main(int argc, char *argv[]) 
    bool quit = false;
    SDL_Event event;

    SDL_Init(SDL_INIT_VIDEO);

    SDL_Window *window = SDL_CreateWindow("My SDL Empty window",
                                          SDL_WINDOWPOS_UNDEFINED,
                                          SDL_WINDOWPOS_UNDEFINED,
                                          640, 480, 0);
    SDL_Renderer *renderer = SDL_CreateRenderer(window, -1, 0);

    int yuv_width = 700;
    int yuv_height = 700;
    auto yuv_file_path = "/Users/user/Downloads/yuv_viewer_test/rainbow-yuv420p.yuv";

    SDL_Texture *texture = loadYUV420PTexture(yuv_file_path, renderer,
                                              yuv_width, yuv_height);

    for (; !quit;) 
        SDL_WaitEvent(&event);

        switch (event.type) 
        case SDL_QUIT: 
            quit = true;
            break;
        
        

        SDL_RenderCopy(renderer, texture, nullptr, nullptr);
        SDL_RenderPresent(renderer);
    

    SDL_DestroyTexture(texture);
    SDL_DestroyRenderer(renderer);
    SDL_DestroyWindow(window);
    SDL_Quit();
    return 0;

在 loadYUV420PTexture 首先导入 YUV 文件，并获取 YUV 各分量内存位置
接着使用 SDL_CreateTexture 创建一个 texture，示例中使用了 SDL_PIXELFORMAT_IYUV，即 yuv420p。并使用 SDL_UpdateYUVTexture 将 YUV 数据更新至 texture 中
最后 SDL_RenderCopy 将 texture 更新到 render 中，使得 render 能够绘制图像

四、使用 libyuv 对 YUV 图片进行操作

4.1 缩放

libyuv 中提供了缩放相关的接口，包括：

I420Scale
I422Scale
I444Scale
NV12Scale
ARGBScale
UVScale
YUVToARGBScaleClip
ARGBScaleClip

从上述的名字可以看到，有些接口在做 Scale 的同时还能够做 Clip 或者格式转换。下面代码显示了如何对 YUV420 文件进行缩放


SDL_Texture *loadAndScaleYUV420PTexture(const std::string &file_path,
                                        SDL_Renderer *renderer,
                                        int src_width, int src_height,
                                        int dst_width, int dst_height) 
    auto file_data = loadFile(file_path);
    auto *yuv_data = file_data.data();
    auto *y_plane = yuv_data;
    size_t y_stride = src_width;
    auto *u_plane = yuv_data + (src_width * src_height);
    size_t u_stride = src_width / 2;
    auto *v_plane = u_plane + (src_width * src_height) / 4;
    size_t v_stride = src_width / 2;

    SDL_Texture *texture = SDL_CreateTexture(renderer,
                                             SDL_PIXELFORMAT_IYUV,
                                             SDL_TEXTUREACCESS_STATIC,
                                             dst_width,
                                             dst_height);

    int aligned_dst_width = (src_width + 1) & ~1;
    auto scale_data_size = dst_width * aligned_dst_width * 3 / 2;
    std::vector<uint8_t> scale_data(scale_data_size);

    auto *dst_y_plane = scale_data.data();
    size_t dst_y_stride = aligned_dst_width;
    auto *dst_u_plane = dst_y_plane + (aligned_dst_width * dst_height);
    size_t dst_u_stride = aligned_dst_width / 2;
    auto *dst_v_plane = dst_u_plane + (aligned_dst_width * dst_height) / 4;
    size_t dst_v_stride = src_width / 2;

    libyuv::I420Scale(y_plane, y_stride, u_plane, u_stride, v_plane, v_stride,
                      src_width, src_height, dst_y_plane, dst_y_stride,
                      dst_u_plane, dst_u_stride, dst_v_plane, dst_v_stride,
                      aligned_dst_width, dst_height,
                      libyuv::FilterMode::kFilterLinear);

    SDL_Rect rect;
    rect.x = 0;
    rect.y = 0;
    rect.w = dst_width;
    rect.h = dst_height;
    SDL_UpdateYUVTexture(texture,
                         &rect,
                         dst_y_plane, dst_y_stride,
                         dst_u_plane, dst_u_stride,
                         dst_v_plane, dst_v_stride);

    return texture;

4.2 裁剪、旋转

libyuv 中提供了一些接口旋转，例如libyuv::I420Rotate 、libyuv::I422Rotate；而裁剪操作，有些接口提供了 crop 相关的参数，例如 libyuv::ConvertToI420，它可以将格式转换到 420P 的同时，进行裁剪和旋转。代码如下：

SDL_Texture *loadAndCropRotateYUV420PTexture(const std::string &file_path,
                                             SDL_Renderer *renderer,
                                             int src_width, int src_height,
                                             int dst_width, int dst_height,
                                             int crop_x, int crop_y,
                                             int crop_width, int crop_height,
                                             int rotate) 
    auto file_data = loadFile(file_path);
    auto *yuv_data = file_data.data();

    SDL_Texture *texture = SDL_CreateTexture(renderer,
                                             SDL_PIXELFORMAT_IYUV,
                                             SDL_TEXTUREACCESS_STATIC,
                                             crop_width,
                                             crop_height);

    int aligned_dst_width = (crop_width + 1) & ~1;
    auto output_data_size = crop_height * aligned_dst_width * 3 / 2;
    std::vector<uint8_t> output_data(output_data_size);

    auto *dst_y_plane = output_data.data();
    size_t dst_y_stride = aligned_dst_width;
    auto *dst_u_plane = dst_y_plane + (aligned_dst_width * crop_height);
    size_t dst_u_stride = aligned_dst_width / 2;
    auto *dst_v_plane = dst_u_plane + (aligned_dst_width * crop_height) / 4;
    size_t dst_v_stride = aligned_dst_width / 2;

    libyuv::ConvertToI420(yuv_data, file_data.size(),
                          dst_y_plane, dst_y_stride,
                          dst_u_plane, dst_u_stride, dst_v_plane, dst_v_stride,
                          crop_x,
                          crop_y,
                          src_width,
                          src_height,
                          crop_width,
                          crop_height,
                          libyuv::RotationMode(rotate),
                          libyuv::FOURCC_I420);

    SDL_Rect rect;
    rect.x = 0;
    rect.y = 0;
    rect.w = aligned_dst_width;
    rect.h = crop_height;
    SDL_UpdateYUVTexture(texture,
                         &rect,
                         dst_y_plane, dst_y_stride,
                         dst_u_plane, dst_u_stride,
                         dst_v_plane, dst_v_stride);

    return texture;

总结

本文首先介绍了 Chroma subsampling 的概念，接着针对不同的 YUV 格式给出了导入 YUV 文件的正确姿势；在获取 YUV 各分量内存位置后，我们可以轻松的使用 SDL 来显示 YUV 文件；最后，对 libyuv 做了简单的介绍

以上是关于YUV 文件读取显示缩放裁剪等操作教程的主要内容，如果未能解决你的问题，请参考以下文章