[转]Inside the Storage Engine: Ghost cleanup in depth
Posted 悦光阴
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[转]Inside the Storage Engine: Ghost cleanup in depth相关的知识,希望对你有一定的参考价值。
Inside the Storage Engine: Ghost cleanup in depth
By: Paul Randal
Over the years I was in the Storage Engine team I saw a lot of concern on the various forums about the ghost cleanup task. There have been a few bugs with it in previous versions (see these KB articles – 932115 and 815594) and there‘s very little info available on it. For some reason I didn‘t get around to posting about it on my old blog but today I want to go into some depth on it.
So what is ghost cleanup? It‘s a background process that cleans up ghost records – usually referred to as the ghost cleanup task. What‘s a ghost record? As I described briefly in the Anatomy of a record post last week, a ghost record is one that‘s just been deleted in an index on a table (well, actually it gets more complicated if snapshot isolation of some form is enabled but for now, a record in an index is a good start). Such a delete operation never physically removes records from pages – it only marks them as having been deleted, or ghosted. This is a performance optimization that allows delete operations to complete more quickly. It also allows delete operations to rollback more quickly because all that needs to happen is to unmark the records as being deleted/ghosted, instead of having to reinsert the deleted records. The deleted record will be physically removed (well, its slot will be removed – the record data isn‘t actually overwritten) later by the background ghost cleanup task. The ghost cleanup task will leave a single record on the page to avoid having to deallocate empty data or index pages.
The ghost cleanup task can‘t physically delete the ghost records until after the delete transaction commits because the deleted records are locked and the locks aren‘t released until the transaction commits. As an aside, when ghost records exist on a page, even a NOLOCK or READ UNCOMMITTED scan won‘t return them because they are marked as ghost records.
When a record is deleted, apart from it being marked as a ghost record, the page that the record is on is also marked as having ghost records in one of the allocation maps – the PFS page (post coming soon!) – and in its page header. Marking a page as having ghost records in a PFS page also changes the database state to indicate that there are some ghost records to cleanup – somewhere. Nothing tells the ghost cleanup task to clean the specific page that the delete happened on – yet. That only happens when the next scan operation reads the page and notices that the page has ghost records.
The ghost cleanup task doesn‘t just start up when it‘s told to – it starts up in the background every 5 seconds and looks for ghost records to cleanup. Remember that it won‘t be told to go cleanup a specific page by a delete operation – it‘s a subsequent scan that does it, if a scan happens. When the ghost cleanup task starts up it checks to see if its been told to cleanup a page – if so it goes and does it. If not, it picks the next database that is marked as having some ghost records and looks through the PFS allocation map pages to see if there are any ghost records to cleanup. It will check through or cleanup a limited number of pages each time it wakes up – I remember the limit is 10 pages – to ensure it doesn‘t swamp the system. So – the ghost records will eventually be removed – either by the ghost cleanup task processing a database for ghost records or by it specifically being told to remove them from a page. If it processes a database and doesn‘t find any ghost records, it marks the database as not having any ghost records so it will be skipped next time.
How can you tell its running? On SQL Server 2005, you can use the following code to see the ghost cleanup task in sys.dm_exec_requests:
SELECT*INTO myexecrequests FROMsys.dm_exec_requestsWHERE 1 = 0;
GO
SETNOCOUNTON;
GO
DECLARE @a INT
SELECT @a = 0;
WHILE(@a < 1)
BEGIN
INSERTINTO myexecrequests SELECT*FROMsys.dm_exec_requestsWHERE command LIKE‘%ghost%‘
SELECT @a =COUNT(*)FROM myexecrequests
END;
GO
SELECT*FROM myexecrequests;
GO
And on SQL Server 2000 you need to use sysprocesses (well, on SQL Server 2005 this works as well but its fake view derived from the DMVs):
SELECT*INTO mysysprocesses FROM master.dbo.sysprocesses WHERE 1 = 0;
GO
SETNOCOUNTON;
GO
DECLARE @a INT
SELECT @a = 0;
WHILE(@a < 1)
BEGIN
INSERTINTO mysysprocesses SELECT*FROM master.dbo.sysprocesses WHERE cmd LIKE‘%ghost%‘
SELECT @a =COUNT(*)FROM mysysprocesses
END;
GO
SELECT*FROM mysysprocesses;
GO
The output from sys.dm_exec_requests is (with most unused and uninteresting columns stripped off):
session_id request_id start_time status command ———- ———– ———————– ———— —————- 15 0 2007-10-05 16:34:49.653 background GHOST CLEANUP
So how can you tell if a record is ghosted? Let‘s engineer some and look at it with DBCC PAGE – I‘ve stripped out the uninteresting bits of the output and highlighted the interesting ghost parts:
CREATETABLE t1 (c1 CHAR(10))
CREATECLUSTEREDINDEX t1c1 on t1 (c1)
GO
BEGINTRAN
INSERTINTO t1 VALUES(‘PAUL‘)
INSERTINTO t1 VALUES(‘KIMBERLY‘)
DELETEFROM t1 WHERE c1=‘KIMBERLY‘;
GO
DBCC IND (‘ghostrecordtest‘,‘t1‘, 1);
GO
DBCC TRACEON (3604);
GO
DBCC PAGE (‘ghostrecordtest‘, 1, 143, 3);
GO
<snip>
m_freeData = 130 m_reservedCnt = 0 m_lsn = (20:88:20) m_xactReserved = 0 m_xdesId = (0:518) m_ghostRecCnt = 1 m_tornBits = 0
<snip>
Slot 0 Offset 0x71 Length 17
Record Type = GHOST_DATA_RECORD Record Attributes = NULL_BITMAP Memory Dump @0x6256C071
00000000: 1c000e00 4b494d42 45524c59 20200200 †….KIMBERLY .. 00000010: fc†††††††††††††††††††††††††††††††††††. UNIQUIFIER = [NULL]
Slot 0 Column 1 Offset 0x4 Length 10
c1 = KIMBERLY
Slot 1 Offset 0x60 Length 17
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP Memory Dump @0x6256C060
00000000: 10000e00 5041554c 20202020 20200200 †….PAUL .. 00000010: fc†††††††††††††††††††††††††††††††††††. UNIQUIFIER = [NULL]
Slot 1 Column 1 Offset 0x4 Length 10
c1 = PAUL
Let‘s see what goes on the transaction log during this process (remember this is undocumented and unsupported – do it on a test database) – I‘ve stripped off a bunch of the columns in the output:
DECLARE @a CHAR(20)
SELECT @a = [Transaction ID] FROM fn_dblog (null,null)WHERE [Transaction Name]=‘PaulsTran‘
SELECT*FROM fn_dblog (null,null)WHERE [Transaction ID] = @a;
GO
Current LSN Operation Context Transaction ID ———————— —————– ——————- ————– 00000014:00000054:0011 LOP_BEGIN_XACT LCX_NULL 0000:00000206 00000014:0000005a:0012 LOP_INSERT_ROWS LCX_CLUSTERED 0000:00000206 00000014:0000005a:0013 LOP_INSERT_ROWS LCX_CLUSTERED 0000:00000206 00000014:0000005a:0014 LOP_DELETE_ROWS LCX_MARK_AS_GHOST 0000:00000206 00000014:0000005a:0016 LOP_DELETE_ROWS LCX_MARK_AS_GHOST 0000:00000206
So there are the two inserts followed by the two deletes – with the rows being marked as ghost records. But where‘s the update to the PFS page? Well, changing the ghost bit in a PFS page is not done as part of a transaction. We‘ll need to look for it another way (apart from just dumping everything in the transaction log and searching manually):
SELECTDescription,*FROM fn_dblog (null,null)WHERE Context like‘%PFS%‘AND AllocUnitName like‘%t1%‘;
GO
Description Current LSN Operation Context Transaction ID ————————- ———————— —————- ——— —————- Allocated 0001:0000008f 00000014:00000054:0014 LOP_MODIFY_ROW LCX_PFS 0000:00000208 00000014:0000005a:0015 LOP_SET_BITS LCX_PFS 0000:00000000
The first one is just allocating a page but the second one is the one we‘re looking for – it‘s changed the bit for the page to say it has ghost records on. Let‘s commit the transaction and see what happens, filtering out all the previous transaction log:
SELECTMAX([Current LSN])FROM fn_dblog (null,null);
GO
— 00000014:0000005e:0001
COMMITTRAN
GO
SELECT [Page ID],*FROM fn_dblog (null,null)WHERE [Current LSN] >‘00000014:0000005e:0001‘;
GO
Page ID Current LSN Operation Context Transaction ID ————— ———————— —————— ————— ————– NULL 00000014:0000005f:0001 LOP_COMMIT_XACT LCX_NULL 0000:00000206 0001:0000008f 00000014:00000060:0001 LOP_EXPUNGE_ROWS LCX_CLUSTERED 0000:00000000
We see that almost as soon as the transaction has commited, the ghost cleanup task goes in and process the page. Let‘s check a page dump to make sure the record is gone, and show that the contents of the record are still on the page (again, with non-relevant bits snipped out):
DBCC PAGE (‘ghostrecordtest‘, 1, 143, 3);
GO
<snip>
m_freeData = 130 m_reservedCnt = 0 m_lsn = (20:94:1) m_xactReserved = 0 m_xdesId = (0:518) m_ghostRecCnt = 0 m_tornBits = 0
<snip>
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP Memory Dump @0x6212C060
00000000: 10000e00 5041554c 20202020 20200200 †….PAUL .. 00000010: fc†††††††††††††††††††††††††††††††††††. UNIQUIFIER = [NULL]
Slot 0 Column 1 Offset 0x4 Length 10
c1 = PAUL
DBCC PAGE (‘ghostrecordtest‘, 1, 143, 2);
GO
<snip>
6212C040: 01000000 00000000 00000000 00000000 †……………. 6212C050: 00000000 00000000 00000000 00000000 †……………. 6212C060: 10000e00 5041554c 20202020 20200200 †….PAUL .. 6212C070: fc1c000e 004b494d 4245524c 59202002 †…..KIMBERLY . 6212C080: 00fc0000 00000000 00000000 01000000 †……………. 6212C090: 00000000 13000000 01000000 00000000 †…………….
<snip>
So even though the record no longer exists, all that happened was that the slot was removed from the slot array at the end of the page – the record contents will remain on the page until the space is reused.
In the next post I‘ll go into details of the PFS and other allocation maps. Btw – please let me know if this stuff is interesting – I‘d like to know where to spend blogging time. Thanks!
Related Posts
以上是关于[转]Inside the Storage Engine: Ghost cleanup in depth的主要内容,如果未能解决你的问题,请参考以下文章
Inside the C++ Object Model 总结
[VIDEO] Inside a Tornado, and the Aftermath of Typhoon Soudelor
Parents Leave Baby Inside A Locker Before Going To The Pool
Inside China’s quest to become the global leader in AI
OpenState: Programming Platform-independent Stateful OpenFlow Applications Inside the Switch
org.activiti.engine.activitiexception:version of activiti database(5.22) is more recent than the eng