使用 NEON 指令进行图像阈值处理

Posted 2023-02-16

技术标签:

【中文标题】使用 NEON 指令进行图像阈值处理【英文标题】：Image thresholding using NEON instructions 【发布时间】：2012-07-20 22:07:35 【问题描述】：

我正在为 ios 开发一些图像处理应用程序，阈值处理确实是一个巨大的瓶颈。所以我正在尝试使用 NEON 对其进行优化。这是函数的C版本。有没有办法使用 NEON 重写它（不幸的是我完全没有这方面的经验）？

static void thresh_8u( const Image& _src, Image& _dst, uchar thresh, uchar maxval, int type ) 
    int i, j;
    uchar tab[256];
    Size roi = _src.size();
    roi.width *= _src.channels();

    memset(&tab[0], 0, thresh);
    memset(&tab[thresh], maxval, 256-thresh);

    for( i = 0; i < roi.height; i++ ) 
        const uchar* src = (const uchar*)(_src.data + _src.step*i);
        uchar* dst = (uchar*)(_dst.data + _dst.step*i);
        j = 0;

        for(; j <= roi.width; ++j) 
            dst[j] = tab[src[j]];

【问题讨论】：

当然，只需使用阈值执行VCGT，然后使用maxval 执行VAND。不幸的是，我对 ARM 和 NEON 的了解还不够，无法将其变成完整的东西。 This guide to NEON on iOS 和a few things iOS developers ought to know about ARM 可能会有所帮助。它们不能解决您的确切问题，但可能会为您提供开始调查的地方。嵌套循环中的条件不应该是“j 【参考方案1】：

实际上很简单，如果你能确保你的行总是 16 字节宽的倍数，因为编译器 (clang) 具有表示 NEON 向量寄存器的特殊类型，并且知道如何将普通 C 运算符应用于它们.这是我的小测试功能：

#ifdef __ARM_NEON__

#include <arm_neon.h>

void computeThreshold(void *input, void *output, int count, uint8_t threshold, uint8_t highValue) 
    uint8x16_t thresholdVector = vdupq_n_u8(threshold);
    uint8x16_t highValueVector = vdupq_n_u8(highValue);
    uint8x16_t *__restrict inputVector = (uint8x16_t *)input;
    uint8x16_t *__restrict outputVector = (uint8x16_t *)output;
    for ( ; count > 0; count -= 16, ++inputVector, ++outputVector) 
        *outputVector = (*inputVector > thresholdVector) & highValueVector;
    


#endif

这一次操作 16 个字节。 uint8x16_t 是一个向量寄存器，包含 16 个 8 位无符号整数。 vdupq_n_u8 返回一个向量 uint8x16_t，其中填充了其参数的 16 个副本。

> 运算符应用于两个 uint8x16_t 值，在 8 位无符号整数对之间进行 16 次比较。如果左输入大于右输入，则返回 0xff（这与普通的 C > 不同，后者仅返回 0x01）。左边输入小于等于右边输入时，返回0。（编译成VCGT.U8指令）

& 运算符应用于两个 uint8x16_t 值，计算 128 对位的布尔与。

循环在发布版本中编译为：

0x6e668:  vldmia r2, d4, d5
0x6e66c:  subs   r0, #16
0x6e66e:  vcgt.u8 q10, q10, q8
0x6e672:  adds   r2, #16
0x6e674:  cmp    r0, #0
0x6e676:  vand   q10, q10, q9
0x6e67a:  vstmia r1, d4, d5
0x6e67e:  add.w  r1, r1, #16
0x6e682:  bgt    0x6e668

【讨论】：

以上是关于使用 NEON 指令进行图像阈值处理的主要内容，如果未能解决你的问题，请参考以下文章