使用快速英特尔随机生成器(SSE2)失败,堆栈周围...已损坏
Posted
技术标签:
【中文标题】使用快速英特尔随机生成器(SSE2)失败,堆栈周围...已损坏【英文标题】:Using fast Intel random generator(SSE2) fails with stack around ... is corrupted 【发布时间】:2013-11-19 15:36:55 【问题描述】:我需要非常快(最快)的随机发生器。我从 Intel 找到了这个:Fast Intel Random Number Generator
看起来不错。所以我在 MS Visual Studio 2013 创建了项目:
//FastRandom.h:
#pragma once
#include "emmintrin.h"
#include <time.h>
//define this if you wish to return values similar to the standard rand();
#define COMPATABILITY
namespace Brans
__declspec(align(16)) static __m128i cur_seed;
// uncoment this if you are using intel compiler
// for MS CL the vectorizer is on by default and jumps in if you
// compile with /O2 ...
//#pragma intel optimization_parameter target_arch=avx
//__declspec(cpu_dispatch(core_2nd_gen_avx, core_i7_sse4_2, core_2_duo_ssse3, generic )
inline void rand_sse(unsigned int* result)
__declspec(align(16)) __m128i cur_seed_split;
__declspec(align(16)) __m128i multiplier;
__declspec(align(16)) __m128i adder;
__declspec(align(16)) __m128i mod_mask;
__declspec(align(16)) __m128i sra_mask;
__declspec(align(16)) __m128i sseresult;
__declspec(align(16)) static const unsigned int mult[4] =
214013, 17405, 214013, 69069 ;
__declspec(align(16)) static const unsigned int gadd[4] =
2531011, 10395331, 13737667, 1 ;
__declspec(align(16)) static const unsigned int mask[4] =
0xFFFFFFFF, 0, 0xFFFFFFFF, 0 ;
__declspec(align(16)) static const unsigned int masklo[4] =
0x00007FFF, 0x00007FFF, 0x00007FFF, 0x00007FFF ;
adder = _mm_load_si128((__m128i*) gadd);
multiplier = _mm_load_si128((__m128i*) mult);
mod_mask = _mm_load_si128((__m128i*) mask);
sra_mask = _mm_load_si128((__m128i*) masklo);
cur_seed_split = _mm_shuffle_epi32(cur_seed, _MM_SHUFFLE(2, 3, 0, 1));
cur_seed = _mm_mul_epu32(cur_seed, multiplier);
multiplier = _mm_shuffle_epi32(multiplier, _MM_SHUFFLE(2, 3, 0, 1));
cur_seed_split = _mm_mul_epu32(cur_seed_split, multiplier);
cur_seed = _mm_and_si128(cur_seed, mod_mask);
cur_seed_split = _mm_and_si128(cur_seed_split, mod_mask);
cur_seed_split = _mm_shuffle_epi32(cur_seed_split, _MM_SHUFFLE(2, 3, 0, 1));
cur_seed = _mm_or_si128(cur_seed, cur_seed_split);
cur_seed = _mm_add_epi32(cur_seed, adder);
#ifdef COMPATABILITY
// Add the lines below if you wish to reduce your results to 16-bit vals...
sseresult = _mm_srai_epi32(cur_seed, 16);
sseresult = _mm_and_si128(sseresult, sra_mask);
_mm_storeu_si128((__m128i*) result, sseresult);
return;
#endif
_mm_storeu_si128((__m128i*) result, cur_seed);
return;
inline void srand_sse(unsigned int seed)
cur_seed = _mm_set_epi32(seed, seed + 1, seed, seed + 1);
inline void srand_sse()
unsigned int seed = (unsigned int)time(0);
cur_seed = _mm_set_epi32(seed, seed + 1, seed, seed + 1);
inline unsigned int GetRandom(unsigned int low, unsigned int high)
unsigned int ret = 0;
rand_sse(&ret);
return ret % (high - low + 1) + low;
;
// Test.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include "FastRandom.h"
#include <iostream>
using namespace Brans;
int _tmain(int argc, _TCHAR* argv[])
srand_sse();
unsigned int result = 0;
for (size_t i = 0; i < 10000; i++)
result += GetRandom(1, 50);
result -= GetRandom(1, 50);
std::cout << result << std::endl;
return 0;
我期望 0 结果 +- 50。但是当我在 Debug 中运行程序时,我得到: 运行时检查失败 #2 - 变量 'ret' 周围的堆栈已损坏。 在 GetRandom(...) 处。当我在发行版中运行它时,我得到了未定义的 result,最大为 unsigned int。 (我使用的是 intel i5 处理器)。
怎么了?
========= 添加到已接受的答案中,我也有错误,我应该使用 long 而不是 unsigned int,因为对于 unsigned,否定结果变得很大。
【问题讨论】:
您是否添加了适当的编译器标志来启用 sse2 编译? 您是否在调试器下运行它并确定堆栈损坏的确切位置? pyCthon:我启用了:流式 SIMD 扩展 2 (/arch:SSE2)。这就是我所需要的吗? Timo Geusch:在 inline 末尾 unsigned int GetRandom(unsigned int low, unsigned int high) 【参考方案1】:来自 Intel Fast Random Generator 的文档:
rand_sse() 函数实现了这个 fast_rand() 函数的矢量化版本,其中整数数学运算使用 SIMD 架构分四次完成。
这意味着rand_sse
使用 sse2 一次为您生成 4 个随机数。
所以你需要给它unsigned int
的数组:
unsigned int result[4];
rand_sse( result );
【讨论】:
【参考方案2】:这条指令:
_mm_storeu_si128((__m128i*) result, cur_seed);
强制将result
、unsigned int*
强制转换为__m128i*
,然后在那里写入一个 128 位的值。 unsigned int
无法容纳 128 位值,因此您最终会破坏调用站点周围的堆栈,在 GetRandom
:
unsigned int ret = 0;
rand_sse(&ret);
【讨论】:
以上是关于使用快速英特尔随机生成器(SSE2)失败,堆栈周围...已损坏的主要内容,如果未能解决你的问题,请参考以下文章
c++ 运行时检查失败 #2 - 变量“ToSend22”周围的堆栈已损坏