Visual Studio 2013 C++ 本机代码中与互锁操作的线程同步挂起
Posted
技术标签:
【中文标题】Visual Studio 2013 C++ 本机代码中与互锁操作的线程同步挂起【英文标题】:Thread synchronization with interlocked ops hangs in Visual Studio 2013 C++ native code 【发布时间】:2014-05-04 18:13:12 【问题描述】:我在 C++(64 位 Windows 8.1、Visual Studio 2013、本机 C++)中观察到线程同步的奇怪行为。
对象是获取对内存中数据结构(“表”)的读取访问权限。计数器tableRIP
跟踪当前有多少线程获取了这个(有 32 个线程)。单个线程也可以对表具有写访问权。当一个线程具有写访问权限时,没有线程可能获得读访问权限。 cacheLocks
中的位 CacheLock_WriterWaiting
(=2) 在线程具有写访问权限时设置。
代码如下:
volatile long cacheLocks; // bits below
enum CacheLockBit CacheLock_Table,
CacheLock_LRUQ,
CacheLock_WriterWaiting,
CacheLock_Part
;
volatile short tableRIP; // # of readers now in process
Restart:
// Get read access to the table. If we need to write it, it will be changed to write access later.
InterlockedIncrement16(&tableRIP); // assume we will get read access
if(cacheLocks & (1<<CacheLock_WriterWaiting)) // non-zero if a writer is waiting or active
InterlockedDecrement16(&tableRIP); // oops, a writer got in, so we're forbidden
InterlockedIncrement64(&fc_Wait[0]); // counter for diagnostic purposes
Wait(waitMs); // waitMs is a constant 1 (msec)
goto Restart;
// Now we're a valid reader, and writer can't proceed till we've finished
莫名其妙的行为是,程序在这个循环中挂了。当我使用调试器并单步执行循环时(详情如下),它会立即退出。它的行为是 AS IF 变量 cacheLocks
不是易失性的(但是,正如您从下面的汇编代码中看到的那样)。
在我查看的时候,只有一个线程处于活动状态(这个)。其他 31 个正在等待这个完成,还有一个 UI 线程也处于活动状态,它不会访问这个数据结构。
由于这是一个发布版本,我正在使用汇编代码进行调试并直接查看内存。这里又是代码,不过是在调试器中查看的汇编代码:
Restart:
// Get read access to the table. If we need to write it, it will be changed to write access later.
InterlockedIncrement16(&tableRIP); // assume we will get read access
00007FF789DF9970 lock inc word ptr [rbx+2A4h] // (1) before 0, after 1
if(cacheLocks & (1<<CacheLock_WriterWaiting)) // non-zero if a writer is waiting or active
00007FF789DF9978 mov eax,dword ptr [rbx+2A0h] // (5) eax -> 0
00007FF789DF997E test al,4
00007FF789DF9980 je $Restart+2Fh (7FF789DF999Fh)
InterlockedDecrement16(&tableRIP); // oops, a writer got in, so we're forbidden
00007FF789DF9982 lock dec word ptr [rbx+2A4h]
InterlockedIncrement64(&fc_Wait[0]);
00007FF789DF998A lock inc qword ptr [rbx+1E0h]
Wait(waitMs);
00007FF789DF9992 mov ecx,dword ptr [rbx+290h]
00007FF789DF9998 call Concurrency::wait (7FF789FB1000h) // (3) debugger breaks here
goto Restart; // (4)
00007FF789DF999D jmp FileCache::CacheInsureLoaded+0A6h (7FF789DF9966h)
// Now we're a valid reader, and writer can't proceed till we've finished
当我使用调试器“中断”程序时,线程位于系统例程Concurrency::wait
中。我退出这些,直到我的代码中到达 (4)。然后我检查rbx+2A4h
(即tableRIP
)的内存,它是0。单步执行inc
后,它是1,正如预期的那样。检查rbx+2A0h
(即cacheLocks
)处的内存,它在位置(5)处为0(即,没有作家活动)。再一步,我们跳到$Restart+2Fh
,退出循环。
程序在循环中旋转了几个小时,直到使用调试器单步执行汇编代码。从上面的代码可以看出,C++ 编译器已经正确地将变量tableRIP
和cacheLocks
视为易失性变量:它每次都从内存中加载它们。我注意到这两个变量在内存中是相邻的。我需要考虑一些硬件功能吗?处理器是 Intel Core i7-4771。
编辑:为了回答我发帖的问题,这里有更详细的代码,显示了cacheLocks
的所有操作。还有cachePart[iPart]
锁的用法,就是对缓冲区的细粒度锁;这与表的锁定无关,并且没有显示缓冲区锁定的所有用法。
与锁定无关的部分代码已替换为// PROCESS
。
// Data members of class FileCache:
volatile long cacheLocks; // bits below
enum CacheLockBit CacheLock_Table,
CacheLock_LRUQ,
CacheLock_WriterWaiting,
CacheLock_Part
;
volatile short tableRIP; // # of readers now in process
// Code from class FileCache:
Restart:
// Get read access to the table. If we need to write it, it will be changed to write access later.
InterlockedIncrement16(&tableRIP); // assume we will get read access
if(cacheLocks & (1<<CacheLock_WriterWaiting)) // non-zero if a writer is waiting or active
InterlockedDecrement16(&tableRIP); // oops, a writer got in, so we're forbidden
InterlockedIncrement64(&fc_Wait[0]);
Wait(waitMs);
goto Restart;
// Now we're a valid reader, and writer can't proceed till we've finished
// PROCESS
if(iPart!=bs_NotInCache && iPart!=bs_Writing) // i e, it's in cache and not in process of being written
if(cachePart[iPart].nLocks[lt_FileRead]==rl_FileReadLock)
// Another thread is setting a file read lock on this part, for unknown ix. Must wait, in case it's for this ix.
InterlockedDecrement16(&tableRIP); // reader in no longer in progress
InterlockedIncrement64(&fc_Wait[1]);
Wait(waitMs);
goto Restart;
// Lock this cache part
while(InterlockedBitTestAndSet(&cachePart[iPart].partLocks, CacheLock_Part)) // returns 1 if bit (lock) was already set
InterlockedIncrement64(&fc_Wait[9]);
Wait(waitMs);
while(cachePart[iPart].nLocks[lt_FileRead]!=0)
// Another thread is reading the desired block. Must wait till that is complete, then start over.
InterlockedBitTestAndReset(&cachePart[iPart].partLocks, CacheLock_Part); // release the mutex
InterlockedDecrement16(&tableRIP); // reader in no longer in progress
InterlockedIncrement64(&fc_Wait[2]);
Wait(waitMs);
goto Restart;
// PROCESS
InterlockedDecrement16(&tableRIP); // reader in no longer in progress
return iPart;
else if(partFromBlock[block]==bs_Writing)
// Another thread is writing this block--must wait till it's finished, then try again
if(debugPartFromBlock)
PartFromBlockCheck(workerThreadNum);
InterlockedDecrement16(&tableRIP); // reader in no longer in progress
InterlockedIncrement64(&fc_Wait[6]);
Wait(waitMs);
goto Restart;
// Desired block isn't in cache; must read it from file.
// Now we need a write lock.
InterlockedDecrement16(&tableRIP); // we're no longer a reader
if(InterlockedBitTestAndSet(&cacheLocks, CacheLock_WriterWaiting)) // get 'writer active' status
InterlockedIncrement64(&fc_Wait[7]);
Wait(waitMs);
goto Restart;
// We have 'writer active' set, but we need to wait for all readers to finish
while(tableRIP > 0)
InterlockedIncrement64(&fc_Wait[8]);
Wait(waitMs);
// Now this thread is the only one accessing the table
iPart=CacheFill(workerThreadNum, clt, ix, block, lType);
if(iPart<0)
// CacheFill was unable to lock the part
unsigned char locks=InterlockedBitTestAndReset(&cacheLocks, CacheLock_WriterWaiting); // no longer writer active
InterlockedIncrement64(&fc_Wait[3]);
Wait(waitMs);
goto Restart;
// Convert the file read lock to the desired lock type. Current lock should be exactly 1 file read.
long long locks=InterlockedCompareExchange64(&cachePart[iPart].allLocks,1LL<<(lType*16),LOCKPARTS(0,0,1,0));
return iPart;
// Read a block which contains the desired bit into a cache part. Table is 'writer active'.
// If the cache has been modified, write it first.
// Returns the part #, and turns off 'writer active'.
int FileCache::CacheFill(const int workerThreadNum, const CacheLocType clt, const DBIndex ix, const unsigned long long block, const LockType lType)
int retries=0;
Restart:
// PROCESS
// Found an eligible part--try to lock it. To succeed, there must be no locks of any kind on the cache part
long long locks=InterlockedCompareExchange64(&cachePart[iPartLRU].allLocks,LOCKPARTS(0,0,rl_FileReadLock,0),0);
if(locks!=0)
if(retries>10)
return -2; // unable to find an available buffer; wait till one becomes available
InterlockedIncrement64(&fc_Wait[4]);
Wait(waitMs);
retries++;
goto Restart; // try for another
// PROCESS
locks=InterlockedBitTestAndReset(&cacheLocks, CacheLock_WriterWaiting); // no longer writer active
// PROCESS
// Now the old block (if any) is gone, so we can remove it from the table
if(oldBase!=NoIndex)
// Lock table again. It was unlocked so other threads could run while we were writing.
// However, another writer is not allowed to remove the 'oldBase' part.
while(InterlockedBitTestAndSet(&cacheLocks, CacheLock_WriterWaiting)) // returns 1 if bit (lock) was already set
InterlockedIncrement64(&fc_Wait[5]);
Wait(waitMs);
// No other threads can access the table, except readers in progress. We have to wait for those to finish.
while(tableRIP > 0)
// No other threads can access the table, except readers in progress. We have to wait for those to finish.
InterlockedIncrement64(&fc_Wait[10]);
Wait(waitMs);
unsigned char locks=InterlockedBitTestAndReset(&cacheLocks, CacheLock_WriterWaiting); // no longer writer active
// PROCESS
return iPart;
【问题讨论】:
那么问题是它永远不会退出循环,还是永远不会循环?无论如何,原子操作似乎无关紧要,对吧? 好的,重新阅读后,它在实际运行时似乎永远不存在,但在调试器中运行时确实退出。无论如何,我认为问题出在 cacheLocks 的编写上。请注意,您必须非常关注内存一致性模型。 X86 非常严格,但不是完全顺序的。 在使用 32 个线程的程序中保证不存在死锁是非常难以证明的。一个起点是使用经过测试的读写器锁实现。您的问题提供了obvious candidate。 看起来 cacheLocks 没有被写入线程清除。你能记录下写锁的设置和清除吗?写锁是否可能存在竞争?它是原子的吗? kec:pgm 有时会挂起(不退出循环)。它总是在调试器下运行。 【参考方案1】:其他信息:
参考初始循环,看起来好像正在发生的事情是Wait(1 msec)
永远不会返回。应该发生的是,在短暂等待后,线程再次检查cacheLocks
。
当 pgm 被调试器“中断”,然后继续时,挂起的线程继续,找到 cacheLocks
= 0,并退出循环。
Wait 函数只是
void Wait(const int msec)
Concurrency::wait(msec);
return;
所有工作线程都以“最低”优先级运行,主 UI 线程除外,它处于“正常”。我找不到任何关于 Concurrency::wait(msec)
和 Windows 线程调度程序如何工作的详细信息。
也许有人可以解释为什么会这样?
【讨论】:
以上是关于Visual Studio 2013 C++ 本机代码中与互锁操作的线程同步挂起的主要内容,如果未能解决你的问题,请参考以下文章
针对 Visual Studio 2012 本机 C++ 测试从命令行运行 mstest
使用 Visual Studio 2013 为本机 DLL 正确生成 PDB 文件
Windows 7 64 位 Visual Studio 2013 上的本机 cl.exe