SSE中的比较操作
Posted
技术标签:
【中文标题】SSE中的比较操作【英文标题】:Comparison operation in SSE 【发布时间】:2014-10-16 10:37:35 【问题描述】:我是 SSE 编码的新手。我想为我的算法编写一个 SSE 代码。我想把下面的 C 代码转换成 SSE 代码。
for(int i=1;i<height;i++)
for(int j=1;j<width;j++)
int index = 0;
if(input[width*i + j]<=input[width*(i-1)+(j-1)])) index += 0x80;
if(input[width*i + j]<=input[width*(i-1)+(j )])) index += 0x40;
if(input[width*i + j]<=input[width*(i-1)+(j+1)])) index += 0x20;
if(input[width*i + j]<=input[width*(i )+(j-1)])) index += 0x10;
if(input[width*i + j]<=input[width*(i )+(j+1)])) index += 0x08;
if(input[width*i + j]<=input[width*(i+1)+(j-1)])) index += 0x04;
if(input[width*i + j]<=input[width*(i+1)+(j )])) index += 0x02;
if(input[width*i + j]<=input[width*(i+1)+(j+1)])) index ++;
output[width*(i-1)+(j-1)] = index;
这是我的 SSE 代码:
unsigned char *dst_d = outputbuffer
float *CT_image_0 = inputbuffer;
float *CT_image_1 = CT_image_0 + width;
float *CT_image_2 = CT_image_1 + width;
for(int i=1;i<height;i++)
for(int j=1;j<width;j+=4)
__m128 CT_current_00 = _mm_loadu_ps((CT_image_0+j-1));
__m128 CT_current_10 = _mm_loadu_ps((CT_image_1+j-1));
__m128 CT_current_20 = _mm_loadu_ps((CT_image_2+j-1));
__m128 CT_current_01 = _mm_loadu_ps(((CT_image_0+1)+j-1));
__m128 CT_current_11 = _mm_loadu_ps(((CT_image_1+1)+j-1));
__m128 CT_current_21 = _mm_loadu_ps(((CT_image_2+1)+j-1));
__m128 CT_current_02 = _mm_loadu_ps(((CT_image_0+2)+j-1));
__m128 CT_current_12 = _mm_loadu_ps(((CT_image_1+2)+j-1));
__m128 CT_current_22 = _mm_loadu_ps(((CT_image_2+2)+j-1));
__m128 val = CT_current_11;
//Below I tried to write the SSE instruction but that was wrong :(
//--How I can do index + ...operation with this _mm_cmple_ss return value ????
__m128 sample6= _mm_cmple_ss(val,CT_current_00);
sample6 += _mm_cmple_ss(val,CT_current_01);
sample6 += _mm_cmple_ss(val,CT_current_02);
sample6 += _mm_cmple_ss(val,CT_current_10);
sample6 +=_mm_cmple_ss(val,CT_current_12);
sample6 +=_mm_cmple_ss(val,CT_current_20);
sample6 +=_mm_cmple_ss(val,CT_current_21);
sample6 +=_mm_cmple_ss(val,CT_current_22);
CT_image_0 +=width;
CT_image_1 +=width;
CT_image_2 +=width;
dst_d += (width-2);
我打破了我的头并尝试(作为一个外行)使用 if 条件......请给我一些想法???
【问题讨论】:
【参考方案1】:需要工作的部分显然是这样的:
__m128 sample6= _mm_cmple_ss(val,CT_current_00);
sample6 += _mm_cmple_ss(val,CT_current_01);
sample6 += _mm_cmple_ss(val,CT_current_02);
sample6 += _mm_cmple_ss(val,CT_current_10);
sample6 +=_mm_cmple_ss(val,CT_current_12);
sample6 +=_mm_cmple_ss(val,CT_current_20);
sample6 +=_mm_cmple_ss(val,CT_current_21);
sample6 +=_mm_cmple_ss(val,CT_current_22);
您需要将所有比较结果组合成一组标志,例如像这样:
__m128i out = _mm_setzero_si128(); // init output flags to all zeroes
__m128i test;
test = _mm_cmple_ss(val, CT_current_00); // compare
test = _mm_and_si128(test, _mm_set1_epi32(0x80)); // mask all but required flag
out = _mm_or_si128(out, test); // merge flags to output mask
test = _mm_cmple_ss(val, CT_current_01);
test = _mm_and_si128(test, _mm_set1_epi32(0x40));
out = _mm_or_si128(out, test);
// ... repeat for each offset and flag value
// ... then finally extract 4 bytes from `out`
// ... and store at output[width*(i-1)+(j-1)]
【讨论】:
【参考方案2】:我不知道 SSE 是什么代码,但您很可能希望运行一个/或某种组合 CT_current 变量组合成一个字符串数组,然后将它们与前面提到的(通过您的代码)连接到一个列表中, CT=** 规范(其中 CT** 是您之后放置的所有内容);为了迭代回您打印到的 _m128,然后如您所知,您可以在完成后进行两次迭代。
祝你好运。
【讨论】:
以上是关于SSE中的比较操作的主要内容,如果未能解决你的问题,请参考以下文章
SSE 优化(行重新排序、操作整理)中的编译器(例如 g++)有多聪明
用于比较 (_mm_cmpeq_ps) 和赋值操作的 SSE 内在函数