SSE图像算法优化系列二十三: 基于value-and-criterion structure 系列滤波器（如Kuwahara，MLV，MCV滤波器）的优化。

Posted 2021-01-12 imageshop

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了SSE图像算法优化系列二十三: 基于value-and-criterion structure 系列滤波器（如Kuwahara，MLV，MCV滤波器）的优化。相关的知识，希望对你有一定的参考价值。

基于value-and-criterion structure方式的实现的滤波器在原理上其实比较简单，感觉下面论文中得一段话已经描述的比较清晰了，直接贴英文吧，感觉翻译过来反而失去了原始的韵味了。

The value-and-criterion filter structure is based on the geometrical structure of mathematical morphology, but allows the use of a much wider variety of operations (both linear and nonlinear) than standard morphology. Morphological filters usually only use extreme order statistic operations (maximum and minimum). The new filter structure is much more flexible, but still retains the geometric characteristics of mathematical morphology.

The value-and-criterion filter structure is derived from the “compound” structure of the morphological operations opening and closing. Opening and closing are both sequential operations; opening consists of the successive application of the basic morphological operations of erosion and dilation, and closing is dilation followed by erosion. Like opening or closing, a value- and-criterion function has two distinct stages. However, a value-and-criterion filter can have two separate operations acting in parallel in the first stage, as opposed to a single operation (such as erosion in the case of opening). The second stage of a value-and-criterion function uses the output of one of these first stage operations to determine which output from the other first stage operation is selected as the final output。

　　简单的理解，value-and-criterion structure这种结构的滤波器有两个元素，一个是值(Value)函数，一个是评判标准(criterion)函数，普通的过程我们我们只需要计算出值函数，就结束了，而value-and-criterion structure这种结构需要根据评判标准来选择最后的值函数的值。

　　经典的此类函数有Kuwahara滤波器，Mean of Least Variance(MLV)滤波器，Minimum Coefficient of Variation(MCV)滤波器等，我们稍微介绍一下。

　　一、Kuwahara滤波器

很明显，这个是由Kuwahara等人提出来的。其基本原理是：计算图像模板中邻域内的均值和方差，选择图像灰度值较为均匀的区域的均值替代模板中心像素灰度值。而图像模板中较为均匀的区域所对应的方差是最小的。为了获取图像模板中较为均匀区域的均值，滤波区域R被划分为K个重叠的子区域R1，R2,…, Rk。位于图像中位置为（u，v）处的每一个像素，其所对应的每一个模板子区域的均值和方差都将被计算。那么标准的Kuwahara滤波器只计算四块邻域的方差，取方差最小的邻域计算其平均值，得到的结果作为目标像素的新值。当然后来还有Tomita-Tsuji，Nagao-Matsuyama’s等人提出了不同的领域，详见https://blog.csdn.net/lz0499/article/details/54646952一文。

　　在这个滤波器里，value函数对应的是均值，criterion 函数对应的是均方差，从算法实现上核心的是均值和方差的快速实现，这在我的博文SSE图像算法优化系列十四：局部均方差及局部平方差算法的优化里已经有描述，此处不赘述。

那么一个简单的Kuwahara滤波器的主要代码可能如下：

int IM_KuwaharaFilter(unsigned char *Src, unsigned char *Dest, int Width, int Height, int Stride, int Radius)
{
    int Channel = Stride / Width;
    if ((Src == NULL) || (Dest == NULL))                                return IM_STATUS_NULLREFRENCE;
    if ((Width <= 0) || (Height <= 0))                                    return IM_STATUS_INVALIDPARAMETER;
    if ((Channel != 1) && (Channel != 3))                                return IM_STATUS_INVALIDPARAMETER; 
    
    int Status = IM_STATUS_OK;
    unsigned char *Average = (unsigned char *)malloc(Height * Width * sizeof(unsigned char));
    int *Deviation = (int *)malloc(Height * Width * sizeof(int));
    int *RowOffset = (int *)malloc((Width + Radius + Radius) * sizeof(int));
    int *ColOffset = (int *)malloc((Height + Radius + Radius) * sizeof(int));
    unsigned char *Blur = NULL;

    if ((Average == NULL) || (Deviation == NULL) || (RowOffset == NULL) || (ColOffset == NULL))
    {
        Status = IM_STATUS_OUTOFMEMORY;
        goto FreeMemory;
    }
    if (Channel == 1)
    {
        Status = IM_GetLocalMeanAndDeviationFilter(Src, Average, Deviation, Width, Height, Stride, Radius, false);
        if (Status != IM_STATUS_OK)    goto FreeMemory;
    }
    else
    {
        Blur = (unsigned char *)malloc(Height * Stride * sizeof(unsigned char));        //    彩色需要使用明度通道来计算均方差，不然会出现彩色异常块
        if (Blur == NULL)
        {
            Status = IM_STATUS_OUTOFMEMORY;
            goto FreeMemory;
        }

        Status = IM_GetLuminance(Src, Average, Width, Height, Stride, false);            //    得到明度通道
        if (Status != IM_STATUS_OK)    goto FreeMemory;
        Status = IM_GetLocalMeanAndDeviationFilter(Average, NULL, Deviation, Width, Height, Width, Radius, false);    //    无需保存对应的均值，因为没用
        if (Status != IM_STATUS_OK)    goto FreeMemory;
        Status = IM_BoxBlur(Src, Blur, Width, Height, Stride, Radius);                    //    彩色模糊
        if (Status != IM_STATUS_OK)    goto FreeMemory;
    }

    for (int Y = 0; Y < Height + Radius + Radius; Y++)
        ColOffset[Y] = IM_GetMirrorPos(Height, Y - Radius);

    for (int X = 0; X < Width + Radius + Radius; X++)
        RowOffset[X] = IM_GetMirrorPos(Width, X - Radius);

    for (int Y = 0; Y < Height; Y++)
    {
        int *LinePST = Deviation + ColOffset[Y] * Width;            //    均方差
        int *LinePSB = Deviation + ColOffset[Y + Radius + Radius] * Width;
        unsigned char *LinePD = Dest + Y * Stride;
        if (Channel == 1)
        {
            unsigned char *LinePT = Average + ColOffset[Y] * Width;        //    均值
            unsigned char *LinePB = Average + ColOffset[Y + Radius + Radius] * Width;
            for (int X = 0; X < Width; X++)
            {
                int OffsetL = RowOffset[X];                                            //    求4个点的均方差的最小值
                int OffsetR = RowOffset[X + Radius + Radius];
                int Min, Mean;
                if (LinePST[OffsetL] < LinePST[OffsetR])
                {
                    Min = LinePST[OffsetL];
                    Mean = LinePT[OffsetL];
                }
                else
                {
                    Min = LinePST[OffsetR];
                    Mean = LinePT[OffsetR];
                }
                if (Min > LinePSB[OffsetL])
                {
                    Min = LinePSB[OffsetL];
                    Mean = LinePB[OffsetL];
                }
                if (Min > LinePSB[OffsetR])
                {
                    Min = LinePSB[OffsetR];
                    Mean = LinePB[OffsetR];
                }
                LinePD[X] = Mean;
            }
        }
        else
        {
            unsigned char *LinePT = Blur + ColOffset[Y] * Stride;        //    均值
            unsigned char *LinePB = Blur + ColOffset[Y + Radius + Radius] * Stride;
            for (int X = 0; X < Width; X++)
            {
                int OffsetL = RowOffset[X];                                            //    求4个点的均方差的最小值
                int OffsetR = RowOffset[X + Radius + Radius];
                int Min, MeanB, MeanG, MeanR;
                if (LinePST[OffsetL] < LinePST[OffsetR])
                {
                    Min = LinePST[OffsetL];
                    MeanB = LinePT[OffsetL * 3 + 0];
                    MeanG = LinePT[OffsetL * 3 + 1];
                    MeanR = LinePT[OffsetL * 3 + 2];
                }
                else
                {
                    Min = LinePST[OffsetR];
                    MeanB = LinePT[OffsetR * 3 + 0];
                    MeanG = LinePT[OffsetR * 3 + 1];
                    MeanR = LinePT[OffsetR * 3 + 2];
                }

                if (Min > LinePSB[OffsetL])
                {
                    Min = LinePSB[OffsetL];
                    MeanB = LinePB[OffsetL * 3 + 0];
                    MeanG = LinePB[OffsetL * 3 + 1];
                    MeanR = LinePB[OffsetL * 3 + 2];
                }

                if (Min > LinePSB[OffsetR])
                {
                    Min = LinePSB[OffsetR];
                    MeanB = LinePB[OffsetR * 3 + 0];
                    MeanG = LinePB[OffsetR * 3 + 1];
                    MeanR = LinePB[OffsetR * 3 + 2];
                }
                LinePD[0] = MeanB;
                LinePD[1] = MeanG;
                LinePD[2] = MeanR;
                LinePD += 3;
            }
        }
    }
FreeMemory:
    if (Average != NULL)        free(Average);
    if (Deviation != NULL)        free(Deviation);
    if (RowOffset != NULL)        free(RowOffset);
    if (ColOffset != NULL)        free(ColOffset);
    if (Blur != NULL)            free(Blur);
    return Status;
}

　　我们跳过内存分配和均方差等的计算过程，我们来看看灰度版本在宽度方向上是如何根据标准函数来获取值函数的：

            for (int X = 0; X < Width; X++)
            {
                int OffsetL = RowOffset[X];                                            //    求4个点的均方差的最小值
                int OffsetR = RowOffset[X + Radius + Radius];
                int Min, Mean;
                if (LinePST[OffsetL] < LinePST[OffsetR])
                {
                    Min = LinePST[OffsetL];
                    Mean = LinePT[OffsetL];
                }
                else
                {
                    Min = LinePST[OffsetR];
                    Mean = LinePT[OffsetR];
                }
                if (Min > LinePSB[OffsetL])
                {
                    Min = LinePSB[OffsetL];
                    Mean = LinePB[OffsetL];
                }
                if (Min > LinePSB[OffsetR])
                {
                    Min = LinePSB[OffsetR];
                    Mean = LinePB[OffsetR];
                }
                LinePD[X] = Mean;
            }

　　很简单，就是对四个取样点，获取方差的最小值，然后同步获取对应的均值的值，作为结果值。

本质上讲，这个也是个边缘保留滤波器，虽然其保边效果不怎么样，其实Kuwahara主要是作为一个艺术化的滤镜存在，在开源的GPUImage里也有Kuwahara滤波器的实现，参见GPUImageKuwaharaFilter.h文件。

　　我也用别人用的一个美女图，贴个效果吧。

技术分享图片

　　　　　　　　　　　　原图　　　　　　　　　　　　　　　　　　　　　　　　　　　　半径1　　　　　　　　　　　　　　　　　　　　　　　　半径5

　注意，对于彩色图像，我们不易将他们分通道来计算，因为分通道时，同一个像素各通道对应的方差最小值不一定在同一位置，此时就会在最终的结果图像上出现色斑，这不是我们所希望的，在作者的文章中也有类似的说法，如下所示，此时我们可取明度值作为各通道的统一判断依据。

Obviously the Normal filter can‘t be used for color images by applying the filter to each RGB channel separately and then using the three resulting channels to compose the image. The main problem with this is that the sub regions will have different variancesfor each of the channels. For example, a region with the lowest variance in the red channel might have the highest variance in the green channel. This once again causes abiguity which would result in the color of the central pixel to be determined by several regions, which might also result in blurrier edges.

二、MLV滤波器

　　MLV滤波器也是一个典型的value-and-criterion structure结构滤波器，其参考论文见： A morphology-based filter structure for edge-enhanceing smoothing，和Kuwahara不同的是，MLV滤波器并不只取周边四领域的均方差作为标准，而是周边所有包含了中心点店像素的领域的均方差作为判断依据，取均方差最小那个位置的均值来替代模板中心的值。那这个时候计算的耗时因素就必须纳入考虑范围，因为随着半径的增大，这个计算量急剧上升。

　　此时我们想到了前面我的博文SSE图像算法优化系列七：基于SSE实现的极速的矩形核腐蚀和膨胀（最大值和最小值）算法中提到了和半径无关的最值算法的优化，那里巧妙的运用了一些特性，使得算法在半径大之后有可能计算时间还有所下降。

　　但是我们注意到这里的MLV滤波器并不是简单的最值计算，其还带了一个尾巴，在计算最值得时候还需要保留另外一个参数同步，这就有点像C的sort函数功能，比如我要对一个结构体数组进行排序，而排序的规则是根据结构体中某一个成员的值来决定的，sort函数可以轻松搞定这个功能，如果我们用普通的C语言实现上面博文的算法，也是可以容易搞定这个的，但是我这里需要进行SSE处理，那我们先来看看源博文这一块的核心实现语句。

    __m128i Max1 = _mm_set1_epi8(255), Max2 = _mm_set1_epi8(255);                //    这样写能减少一次内存加载
    for (int Y = StartY; Y < EndY; Y++)
    {
        Max1 = _mm_min_epu8(_mm_loadu_si128((__m128i *)(LinePS + 0)), Max1);    //    一条AVX指令
        Max2 = _mm_min_epu8(_mm_loadu_si128((__m128i *)(LinePS + 16)), Max2);
        _mm_store_si128((__m128i *)(LinePD + 0), Max1);
        _mm_store_si128((__m128i *)(LinePD + 16), Max2);
        LinePS += Stride;
        LinePD += 32;
    }

　　很简单，就是_mm_min_epu8的一个简单调用，这里没有任何的比较函数。

如果我们在计算最值得时候还需要利用最值得特性附带上另外一个参量，很明显，如果直接使用上述过程，我们无法做到这一点。必须想办法带上比较特征。

　　实际上，min系列的比较函数，我们也可以使用cmp系列的SSE函数来实现，比如要实现两个32位数的SSE函数的比较，除了使用_mm_min_epi32外，还可以使用下面的代码：

    __m128i S1 = _mm_set_epi32(10, 20, 30, 50);
    __m128i S2 = _mm_set_epi32(15, 8, 34, 234);

    __m128i Min1 = _mm_min_epi32(S1, S2);

    __m128i Cmp = _mm_cmplt_epi32(S1, S2);
    __m128i Min2 = _mm_blendv_epi8(S2, S1, Cmp);

    for (int Y = 0; Y < 4; Y++)
    {
        printf("%d  	", Min1.m128i_i32[Y]);
    }
    printf("
");
    for (int Y = 0; Y < 4; Y++)
    {
        printf("%d  	", Min2.m128i_i32[Y]);
    }

　　结果如下，完全一样：

技术分享图片

　　注意此时我们在过程中获得了一些标志信息，比如上述代码中得Cmp变量，这个非常有用，有了它，我们就相当于有了一枚旗帜，在队伍A里有人举起了一面旗帜，我们只要在队伍B里对应的位置找到对应的人就可以了。这个功能同样是可以借助于_mm_blendv_epi8这个来实现的。

　　具体到本例，一般方差（不是均方差，此例为未除以取样总数的值，即(v0-m)*(v0-m)+(v1-m)*(v1-m)+...+(vn-m)*(vn-m)的值，其中m表示v0到vn的平均值，中间用浮点保存），可以用整形表示（在计算最后舍入），而平均值直接使用unsigned char表示，如上IM_KuwaharaFilter函数所示。此时，按照最值那篇文章的例子，我们可以一次性比较四个SSE变量，这样产生了4个比较标志位，而此时我们需要附带的信息是4*4个字节类型的数据，此时还需将4个比较标志位的SSE变量合并成一个SSE变量，以方便_mm_blendv_epi8使用，这个时候我们就可以借用packs系列的函数了，具体如下所示:

    __m128i Min0 = _mm_set1_epi32(IM_INT_MAX), Min1 = _mm_set1_epi32(IM_INT_MAX);        //    同时处理16个方差数据
    __m128i Min2 = _mm_set1_epi32(IM_INT_MAX), Min3 = _mm_set1_epi32(IM_INT_MAX);
    __m128i Avg = _mm_setzero_si128(), AvgB = _mm_setzero_si128();
    __m128i AvgG = _mm_setzero_si128(), AvgR = _mm_setzero_si128();
    for (int Y = StartY; Y < EndY; Y++)
    {
        __m128i Src0 = _mm_loadu_si128((__m128i *)(LinePS + 0));
        __m128i Src1 = _mm_loadu_si128((__m128i *)(LinePS + 4));
        __m128i Src2 = _mm_loadu_si128((__m128i *)(LinePS + 8));
        __m128i Src3 = _mm_loadu_si128((__m128i *)(LinePS + 12));
        __m128i Cmp0 = _mm_cmplt_epi32(Src0, Min0);
        __m128i Cmp1 = _mm_cmplt_epi32(Src1, Min1);
        __m128i Cmp2 = _mm_cmplt_epi32(Src2, Min2);
        __m128i Cmp3 = _mm_cmplt_epi32(Src3, Min3);

        Min0 = _mm_blendv_epi8(Min0, Src0, Cmp0);            //    不使用_mm_min_epi32，主要是为了得到标识符
        Min1 = _mm_blendv_epi8(Min1, Src1, Cmp1);
        Min2 = _mm_blendv_epi8(Min2, Src2, Cmp2);
        Min3 = _mm_blendv_epi8(Min3, Src3, Cmp3);
                                                                    
        _mm_storeu_si128((__m128i *)(LinePD + 0), Min0);
        _mm_storeu_si128((__m128i *)(LinePD + 4), Min1);
        _mm_storeu_si128((__m128i *)(LinePD + 8), Min2);
        _mm_storeu_si128((__m128i *)(LinePD + 12), Min3);

        if (Channel == 1)                                //    注意要使用_mm_packs_epi16而不是_mm_packus_epi16
        {
            Avg = _mm_blendv_epi8(Avg, _mm_loadu_si128((__m128i *)LinePA), _mm_packs_epi16(_mm_packs_epi32(Cmp0, Cmp1), _mm_packs_epi32(Cmp2, Cmp3)));
            _mm_storeu_si128((__m128i *)LinePM, Avg);            //    最小方差对应的均值
        }
        else
        {
            __m128i Mask0 = _mm_or_si128(_mm_shuffle_epi8(Cmp0, _mm_setr_epi8(0, 0, 0, 4, 4, 4, 8, 8, 8, 12, 12, 12, -1, -1, -1, -1)), _mm_shuffle_epi8(Cmp1, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 4)));
            __m128i Mask1 = _mm_or_si128(_mm_shuffle_epi8(Cmp1, _mm_setr_epi8(4, 4, 8, 8, 8, 12, 12, 12, -1, -1, -1, -1, -1, -1, -1, -1)), _mm_shuffle_epi8(Cmp2, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 4, 4, 4, 8, 8)));
            __m128i Mask2 = _mm_or_si128(_mm_shuffle_epi8(Cmp2, _mm_setr_epi8(8, 12, 12, 12, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1)), _mm_shuffle_epi8(Cmp3, _mm_setr_epi8(-1, -1, -1, -1, 0, 0, 0, 4, 4, 4, 8, 8, 8, 12, 12, 12)));
            AvgB = _mm_blendv_epi8(AvgB, _mm_loadu_si128((__m128i *)LinePA), Mask0);
            AvgG = _mm_blendv_epi8(AvgG, _mm_loadu_si128((__m128i *)(LinePA + 16)), Mask1);
            AvgR = _mm_blendv_epi8(AvgR, _mm_loadu_si128((__m128i *)(LinePA + 32)), Mask2);
            _mm_storeu_si128((__m128i *)LinePM, AvgB);
            _mm_storeu_si128((__m128i *)(LinePM + 16), AvgG);
            _mm_storeu_si128((__m128i *)(LinePM + 32), AvgR);
        }
        LinePS += Width;
        LinePA += Stride;
        LinePD += 16;
        LinePM += 16 * Channel;
    }

　　注意在上述代码中我们使用packs而非packus，请读者自行理解这是为什么。

　　对于彩色图像，根据上面的对Kuwahara滤波器的描述，也是需要计算明度的，此时，一个标志位对应的就是就是3个分量，此时就不能使用packs之类的函数，而可以直接使用上述的_mm_shuffle_epi8类的函数可以轻松搞定。

　　当然具体的过程还涉及到很多其他东西，比如int类型矩阵的转置，内存的有效利用等等，这里不予以多述。

　　三、MCV滤波器

　　同MLV滤波器稍微有点不同的时，MCV滤波器的评判标准有稍作改动，其参考文件见：An edge-enhancing nonlinear filter for reducing multiplicative noise，但是核心的实现没有太大的区别。

　　按照论文的说法，MCV滤波器可以有效地去除乘性噪音，而MLV可以去除加性噪音，如下所示：

The MCV filter therefore uses the sample mean for the value function, the coefficient of variation as the criterion function, and the minimum as the selection function. It is a value-and-criterion filter specifically designed to reduce multiplicative noise.

　　以及

The MCV filter is closely related to a value-and-criterion filter designed to remove additive noise known as the Mean of Least Variance (MLV) filter [7–9]. The MLV filter uses the sample variance as its criterion function rather than the coefficient of variation. In images corrupted by additive noise, the variance is theoretically minimum in structuring elements where the signal is constant.

　　MLV和MCV能产生比Kuwahara更为平滑的图像，在SAR图像以及医疗图像上应该有一定用武之地。而且个人感觉在小半径时其产生的结果对图像的边界分割、边缘查找等也有一定的帮助。

技术分享图片

　　　　　　　　　　　　　　　　　　原图　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　Kuwahara

技术分享图片

　　　　　　　　　　　　　　　　　　　　MLV　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　MCV

　　上述操作半径都为5。

　　处理后，似乎边缘的分界线更为明显。

　　当然我贴的图形其实都不是论文本身强调的应用场景，只是做个例证而已。

　　关于这种类似的滤波器，还可以使用圆形半径，或者是椭圆半径，也有一些人对这方面的需求做了研究，比如https://blog.csdn.net/qq_16013649/article/details/78744910。

　　针对1024*768的彩色图像，此类算法优化后大约耗时20ms，灰度图约10ms，速度也还算比较快了，测试见附件的SSE_Optimization_Demo的stylize菜单。

Demo下载地址：https://files.cnblogs.com/files/Imageshop/SSE_Optimization_Demo.rar

技术分享图片

以上是关于SSE图像算法优化系列二十三: 基于value-and-criterion structure 系列滤波器（如Kuwahara，MLV，MCV滤波器）的优化。的主要内容，如果未能解决你的问题，请参考以下文章

SSE图像算法优化系列二十四: 基于形态学的图像后期抗锯齿算法--MLAA优化研究。

SSE图像算法优化系列二十二：优化龚元浩博士的曲率滤波算法，达到约500 MPixels/Sec的单次迭代速度。

SSE图像算法优化系列十：简单的一个肤色检测算法的SSE优化。

SSE图像算法优化系列十八：三次卷积插值的进一步SSE优化。

SSE图像算法优化系列十一：使用FFT变换实现图像卷积。

SSE图像算法优化系列三十一：Base64编码和解码算法的指令集优化（C#自带函数的3到4倍速度）。

SSE图像算法优化系列二十三: 基于value-and-criterion structure 系列滤波器（如Kuwahara，MLV，MCV滤波器）的优化。

原图 半径1 半径5

　　　　　　　　　　　　原图　　　　　　　　　　　　　　　　　　　　　　　　　　　　半径1　　　　　　　　　　　　　　　　　　　　　　　　半径5