第十三周翻译
Posted holiday-l
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了第十三周翻译相关的知识,希望对你有一定的参考价值。
CHAPTER 2 ■ TABLES AND INDEXES: INTERNAL STRUCTURE AND ACCESS METHODS
第2章 表格和索引:内部结构和访问方式
Figure 2-4. Forwarding pointers and I/O: Reading data when forwarding pointers exist
图2-4 转发指针和I / O:当转发指针存在时读取数据
As you can see, the large number of forwarding pointers leads to extra I/O operations and significantly reduces the performance of the queries accessing the data. Companion materials for this book include the script that demonstrates this problem in a large scope with a table that includes a large amount of data.
正如你所见,大量的转发指针会导致额外的I/O操作和显著降低访问数据的查询性能。本书配套的资料包括,在一个大的范围与包含大量数据的表格说明此问题的脚本。
When the size of the forwarded row is reduced by another update and the data page with forwarding pointer has enough space to accommodate the updated version of the row, SQL Server may move it back to its original data page and remove the forwarding pointer row. Nevertheless, the only reliable way to get rid of all of the forwarding pointers is by rebuilding the heap table. You can do that by using an ALTER TABLE REBUILD statement.
当被另一个更新减少了转发行的大小和转发指针数据页有足够的空间来容纳该行的更新版本,SQL Server可能将其移回原来的数据页,并删除转发指针行。然而,摆脱所有转发指针的唯一可靠方法就是重建堆表。你可以使用ALTER TABLE REBUILD语句执行此操作。
Heap tables can be useful in staging environments, where you want to import a large amount of data into the system as fast as possible. Inserting data into heap tables can often be faster than inserting it into tables with clustered indexes. Nevertheless, during a regular workload, tables with clustered indexes usually outperform heap tables, which have suboptimal space control and extra I/O operations introduced by forwarding pointers.
当你希望尽可能快地将大量数据导入系统时,堆栈表在暂存环境中非常有用。将数据插入堆表通常比将其插入具有聚集索引的表更快。然而,常规的工作量中聚集索引通常优于堆表,堆表具有次优的空间控制和转发指针引入的额外I / O操作。
Clustered Indexes
聚集索引
A clustered index dictates the physical order of the data in a table, which is sorted according to the clustered index key. The table can have only one clustered index defined.
聚集索引指示表中数据的物理顺序,该表根据聚集索引键进行排序。 该表只能定义一个聚集索引。
Let’s assume that you want to create a clustered index on the heap table with the data. As a first step, which is shown in Figure 2-5 , SQL Server creates another copy of the data that is then sorted based on the value of the clustered key. The data pages are linked in a double-linked list where every page contains pointers to the next and previous pages in the chain. This list is called the leaf level of the index, and it contains the actual table data.
假设你要在堆表上使用数据创建聚集索引。 第一步,如图2-5所示,SQL Server会创建另一个数据副本,然后根据群集密钥的值对其进行排序。数据页链接在双链表中,其中每个页面都包含指向链中下一页和上一页的指针。此列表称为索引的叶级,它包含实际的表数据。
Figure 2-5. Clustered index structure: Leaf level
图2-5 聚集的索引结构:叶级
■ Note The sort order on the page is controlled by a slot array. Actual data on the page is unsorted.
注意页面上的排序顺序由插槽阵列控制。 页面上的实际数据未排序。
When the leaf level consists of multiple pages, SQL Server starts to build an intermediate level of the index, as shown in Figure 2-6 .
当叶级别由多个页面组成时,SQL Server开始构建索引的中间级别,如图2-6所示。
Figure 2-6. Clustered index structure: Intermediate and leaf levels
图2-6 聚集的索引结构:中级和叶级
The intermediate level stores one row per leaf-level page. It stores two pieces of information: the physical address and the minimum value of the index key from the page it references. The only exception is the very first row on the first page, where SQL Server stores NULL rather than the minimum index key value. With such optimization, SQL Server does not need to update non-leaf-level rows when you insert the row with the lowest key value in the table.
中间级别为每个叶级页面存储一行。它存储两条信息:它引用的页面中的索引键的物理地址和最小值。唯一的例外是第一页上的第一行,其中SQL Server存储NULL而不是最小索引键值。通过这种优化,当你在表中插入具有最低键值的行时,SQL Server不需要更新非叶级行。
The pages on the intermediate levels are also linked to the double-linked list. SQL Server adds more and more intermediate levels until there is a level that includes just the single page.This level is called the root level , and it becomes the entry point to the index, as shown in Figure 2-7 .
中间级别的页面也链接到双链表。 SQL Server会添加越来越多的中间级别,直到只包含单个页面的级别。此级别称为根级别,它将成为索引的入口点,如图2-7所示。
Figure 2-7. Clustered index structure: Root level
图2-7 聚集索引结构:根级别
As you can see, the index always has one leaf level, one root level, and zero or more intermediate levels. The only exception is when the index data fits into a single page. In that case, SQL Server does not create the separate root-level page, and the index consists of just the single leaf-level page.
如你所见,索引始终具有一个叶级别,一个根级别和零个或多个中间级别。 唯一的例外是索引数据适合单个页面。 在这种情况下,SQL Server不会创建单独的根级页面,索引只包含单个叶级页面。
The number of levels in the index largely depends on the row and index key sizes. For example, the index on the 4-byte integer column will require 13 bytes per row on the intermediate and root levels. Those 13 bytes consist of a 2-byte slot-array entry, a 4-byte index-key value, a 6-byte page pointer, and a 1-byte row overhead, which is adequate because the index key does not contain variable-length and NULL columns.
索引中的级别数很大程度上取决于行和索引键的大小。 例如,4字节整数列上的索引在中间和根级别上每行需要13个字节。 这13个字节由一个2字节的插槽数组条目,一个4字节的索引键值,一个6字节的页面指针和一个1字节的行开销组成,这是足够的,因为索引键不包含变量 - length和NULL列。
As a result, you can accommodate 8,060 bytes / 13 bytes per row = 620 rows per page. This means that, with the one intermediate level, you can store information about up to 620 * 620 = 384,400 leaf-level pages. If your data row size is 200 bytes, you can store 40 rows per leaf-level page and up to 15,376,000 rows in the index with just three levels. Adding another intermediate level to the index would essentially cover all possible integer values.
因此,每行可容纳8,060字节/ 13字节=每页620行。 这意味着,使用一个中间级别,你可以存储最多620 * 620 = 384,400个叶级页面的信息。 如果数据行大小为200字节,则每个叶级页面可存储40行,索引中最多可存储15,376,000行,只有三个级别。 向索引添加另一个中间级别将基本上涵盖所有可能的整数值。
■ Note In real life, index fragmentation would reduce those numbers. We will talk about index fragmentation in Chapter 6 .
注意在现实生活中,索引碎片会减少这些数字。 我们将在第6章讨论索引碎片。
There are three different ways in which SQL Server can read data from the index. The first one is by an ordered scan. Let’s assume that we want to run the SELECT Name FROM dbo.Customers ORDER BY CustomerId query. The data on the leaf level of the index is already sorted based on the CustomerId column value. As a result, SQL Server can scan the leaf level of the index from the first to the last page and return the rows in the order in which they were stored.
SQL Server可以通过三种不同的方式从索引中读取数据。 第一个是有序扫描。 假设我们想要运行SELECT Name FROM dbo.Customers ORDER BY CustomerId查询。 索引的叶级别上的数据已根据CustomerId列值进行排序。 因此,SQL Server可以从第一页到最后一页扫描索引的叶级,并按存储顺序返回行。
SQL Server starts with the root page of the index and reads the first row from there. That row references the intermediate page with the minimum key value from the table.SQL Server reads that page and repeats the process until it finds the first page on the leaf level. Then, SQL Server starts to read rows one by one, moving through the linked list of the pages until all rows have been read. Figure 2-8 illustrates this process.
SQL Server从索引的根页开始,并从那里读取第一行。 该行引用具有表中最小键值的中间页面.SQL Server读取该页面并重复该过程,直到它找到叶级别的第一页。 然后,SQL Server开始逐个读取行,遍历页面的链接列表,直到读取了所有行。 图2-8说明了这个过程。
Figure 2-8. Ordered index scan
图2-8 有序索引扫描
The execution plan for the preceding query shows the Clustered Index Scan operator with the Orderedproperty set to true, as shown in Figure 2-9 .
上述查询的执行计划显示了“集群索引扫描”操作符,其中Orderedproperty设置为true,如图2-9所示。
Figure 2-9. Ordered index scan execution plan
图2-9 有序索引扫描执行计划
It is worth mentioning that the order by clause is not required for an ordered scan to be triggered. An ordered scan just means that SQL Server reads the data based on the order of the index key.
值得一提的是,触发有序扫描不需要order by子句。 有序扫描只意味着SQL Server根据索引键的顺序读取数据。
SQL Server can navigate through indexes in both directions, forward and backward. However, there is one important aspect that you must keep in mind: SQL Server does not usee parallelism during backward index scans.
SQL Server可以向前和向后两个方向导航索引。 但是,你必须记住一个重要方面:SQL Server在向后索引扫描期间不使用并行性。
■ Tip You can check scan direction by examining the INDEX SCAN or INDEX SEEK operator properties in the execution plan. Keep in mind, however, that Management Studio does not display these properties in the graphical representation of the execution plan. You need to open the Properties window to see it by selecting the operator in the execution plan and choosing the View/Properties Window menu item or by pressing the F4 key.
提示您可以通过检查执行计划中的INDEX SCAN或INDEX SEEK运算符属性来检查扫描方向。 但请记住,Management Studio不会在执行计划的图形表示中显示这些属性。 您需要打开“属性”窗口以通过在执行计划中选择运算符并选择“视图/属性窗口”菜单项或按F4键来查看它。
The Enterprise Edition of SQL Server has an optimization feature called merry-go-round scan that allows multiple tasks to share the same index scan. Let’s assume that you have session S1, which is scanning the index. At some point in the middle of the scan, another session, S2, runs a query that needs to scan the same index. With a merry-go-round scan, S2 joins S1 at its current scan location. SQL Server reads each page only once, passing rows to both sessions.
SQL服务器的企业版有一个叫做旋转木马轮扫描,允许多个任务共享相同的索引扫描的优化功能。假设你有会话S1,它正在扫描索引。 在扫描中间的某个时刻,另一个会话S2运行需要扫描相同索引的查询。 通过旋转木马扫描,S2将S1连接到当前扫描位置。 SQL Server只读取每个页面一次,将行传递给两个会话。
When the S1 scan reaches the end of the index, S2 starts scanning data from the beginning of the index until the point where the S2 scan started. A merry-go-round scan is another example of why you cannot rely on the order of the index keys and why you should always specify an ORDER BY clause when it matters.
当S1扫描到达索引的末尾时,S2开始从索引的开头扫描数据,直到S2扫描开始的点。 旋转木马扫描是另一个例子,说明为什么不能依赖索引键的顺序以及为什么在重要时应始终指定ORDER BY子句。
The next access method after the ordered scan is called an allocation order scan . SQL Server accesses the table data through the IAM pages, similar to how it does so with heap tables. The SELECT Name FROM dbo.Customers WITH (NOLOCK) query and Figure 2-10 illustrate this method. Figure 2-11 shows the query execution plan.
有序扫描之后的下一个访问方法称为分配顺序扫描。 SQL Server通过IAM页面访问表数据,类似于使用堆表的方式。 SELECT名称FROM dbo.Customers WITH(NOLOCK)查询和图2-10说明了这种方法。 图2-11显示了展示执行计划。
Figure 2-10. Allocation order scan
图2-10 分配顺序扫描
Figure 2-11. Allocation order scan execution plan
图2-11 分配顺序扫描执行计划
Unfortunately, it is not easy to detect when SQL Server uses an allocation order scan. Even though the Ordered property in the execution plan shows false , it indicates that SQL Server does not care whether the rows were read in the order of the index key, not that an allocation order scan was used.
不幸的是,当SQL Server使用分配顺序扫描时,检测起来并不容易。即便如此执行计划中的有序属性显示为false,表示SQL Server不关心是否按索引键的顺序读取行,而不是使用分配顺序扫描。
An allocation order scan can be faster for scanning large tables, although it has a higher startup cost. SQL Server does not use this access method when the table is small. Another important consideration is data consistency. SQL Server does not use forwarding pointers in tables that have a clustered index, and an allocation order scan can produce inconsistent results. Rows can be skipped or read multiple times due to the data movement caused by page splits. As a result, SQL Server usually avoids using allocation order scans unless it reads the data in READ UNCOMMITTED or SERIALIZABLE transaction-isolation levels.
■ Note We will talk about page splits and fragmentation in Chapter 6, “Index Fragmentation,” and discuss locking and data consistency in Part III, “Locking, Blocking, and Concurrency.”
尽管扫描大型表的启动成本较高,但分配顺序扫描可以更快地扫描大型表。
当表很小时,SQL Server不使用此访问方法。 另一个重要的考虑是数据一致性。 SQL Server不使用具有聚簇索引的表中的转发指针,以及分配顺序扫描会产生不一致的结果。 由于可以多次跳过或读取行页面拆分导致的数据移动。 因此,SQL Server通常会避免使用分配顺序扫描除非它读取READ UNCOMMITTED或SERIALIZABLE事务隔离级别中的数据。
■注意我们将在第6章“索引碎片”中讨论页面拆分和碎片,并进行讨论
第三部分“锁定,阻塞和并发”中的锁定和数据一致性。
The last index access method is called index seek . The SELECT Name FROM dbo.Customers WHERE CustomerId BETWEEN 4 AND 7 query and Figure 2-12 illustrate the operation.
最后一个索引访问方法称为索引查找。 SELECT名称来自dbo.Customers WHERE CustomerId BETWEEN 4和7查询且图2-12说明了操作。
图2-12.索引查找
In order to read the range of rows from the table, SQL Server needs to find the row with the minimum value of the key from the range, which is 4. SQL Server starts with the root page, where the second row references the page with the minimum key value of 350. It is greater than the key value that we are looking for (4), and SQL Server reads the intermediate-level data page (1:170) referenced by the first row on the root page.
为了从表中读取行的范围,SQL Server需要找到最小的行从范围中键的值,即4. SQL Server以根页开始,其中第二行引用最小键值为350的页面。它大于我们要查找的键值(4),SQL Server读取根页上第一行引用的中级数据页(1:170)。
The execution plan is shown in Figure 2-13 .
执行计划如图2-13所示。
图2-13。 索引查找执行计划
As you can guess, index seek is more efficient than index scan, because SQL Server processes just the subset of rows and data pages rather than scanning the entire table.
正如您所看到的,索引搜索比索引扫描更有效,因为SQL Server只处理索引行和数据页的子集,而不是扫描整个表。
Technically speaking, there are two kinds of index seek operations. The first is called a singleton lookup , or sometimes point-lookup , where SQL Server seeks and returns a single row. You can think about WHERE CustomerId = 2 predicate as an example. The other type of index seek operation is called a range scan , and it requires SQL Server to find the lowest or highest value of the key and scan (either forward or backward) the set of rows until it reaches the end of scan range. The predicate WHERE CustomerId BETWEEN 4 AND 7 leads to the range scan. Both cases are shown as INDEX SEEK operations in the execution plans.
从技术上讲,索引搜索操作有两种。 第一种称为单例查找,或者有时是点查找,SQL Server寻找并返回单行。 你可以考虑一下WHERE CustomerId = 2谓词作为示例。 另一种类型的索引查找操作称为范围扫描,和它要求SQL Server找到密钥的最低或最高值并扫描(向前或向后)行集直到它到达扫描范围的末尾。 客户IDI 4和7之间的谓词WHERE引导到范围扫描。 这两种情况都在执行计划中显示为INDEX SEEK操作。
As you can guess, it is entirely possible for range scans to force SQL Server to process a large number or even all data pages from the index. For example, if you changed the query to use a WHERE CustomerId > 0 predicate, SQL Server would read all rows/pages, even though you would have an Index Seek operator displayed in the execution plan. You must keep this behavior in mind and always analyze the efficiency of range scans during query performance tuning.
您可以看到,范围扫描完全可以强制SQL Server处理大量数据或甚至索引中的所有数据页面。 例如,如果您将查询更改为使用WHERE CustomerId> 0谓词,即使你有一个Index Seek运算符,SQL Server也会读取所有行/页面显示在执行计划中。 你必须记住这种行为,并始终分析效率查询性能调整期间的范围扫描。
There is a concept in relational databases called SARGable predicates , which stands for S earch Arg ument able . The predicate is SARGable if SQL Server can utilize an index seek operation, if an index exists. In a nutshell, predicates are SARGable when SQL Server can isolate the single value or range of index key values to process, thus limiting the search during predicate evaluation. Obviously, it is beneficial to write queries using SARGable predicates and utilize index seek whenever possible.
关系数据库中有一个名为SARGable谓词的概念,它代表S earchArg ement能够。 如果SQL Server可以使用索引查找操作(如果是索引),则谓词是SARGable存在。 简而言之,当SQL Server可以隔离单个值或索引范围时,谓词是SARGable要处理的关键值,从而限制谓词评估期间的搜索。 显然,写作是有益的查询使用SARGable谓词并尽可能利用索引查找。
SARGable predicates include the following operators: = , > , >= , < , <= , IN , BETWEEN , and LIKE (in case of prefix matching). Non-SARGable operators include NOT , <> , LIKE (in case of non-prefix matching), and NOT IN .
SARGable谓词包括以下运算符:=,>,> =,<,<=,IN,BETWEEN和LIKE(如果是前缀)匹配)。非SARGable运算符包括NOT,<>,LIKE(在非前缀匹配的情况下)和NOT IN。
Another circumstance for making predicates non-SARGable is using functions or mathematical calculations against the table columns. SQL Server has to call the function or perform the calculation for every row it processes. Fortunately, in some of cases you can refactor the queries to make such predicates SARGable. Table 2-1 shows a few examples of this.
使谓词非SARGable的另一种情况是使用函数或数学针对表列的计算。 SQL Server必须调用该函数或执行计算它处理的每一行。 幸运的是,在某些情况下,您可以重构查询以生成此类谓词优化搜索。表2-1列出了一些例子。
表2-1。 将非SARGable谓词重构为SARGable的示例
Another important factor that you must keep in mind is type conversion . In some cases, you can make predicates non-SARGable by using incorrect data types. Let’s create a table with a varchar column and populate it with some data, as shown in Listing 2-6 .
您必须牢记的另一个重要因素是类型转换。 在某些情况下,您可以使用不正确的数据类型使谓词非SARGable。 让我们创建一个带有varchar列的表,并用一些数据填充它,如清单2-6所示。
Listing 2-6. SARG predicates and data types: Test table creation
清单2-6 SARG谓词和数据类型:测试表创建
create table dbo.Data
(
VarcharKey varchar(10) not null,
Placeholder char(200)
);
create unique clustered index IDX_Data_VarcharKey
on dbo.Data(VarcharKey);
;with N1(C) as (select 0 union all select 0 union all select 0) -- 2 rows
,N2(C) as (select 0 from N1 as T1 cross join N1 as T2) -- 4 rows
,N3(C) as (select 0 from N2 as T1 cross join N2 as T2) -- 16 rows
,N4(C) as (select 0 from N3 as T1 cross join N3 as T2) -- 256 rows
,IDs(ID) as (select row_number() over (order by (select null)) from N5)
insert into dbo.Data(VarcharKey)
select convert(varchar(10),ID) from IDs;
The clustered index key column is defined as varchar, even though it stores integer values. Now, let’s run two selects, as shown in Listing 2-7 , and look at the execution plans.
聚簇索引键列定义为varchar,即使它存储整数值。 现在,我们来运行两个选择,如清单2-7所示,并查看执行计划。
Listing 2-7. SARG predicates and data types: Select with integer parameter
declare
@IntParam int = ‘200‘
select * from dbo.Data where VarcharKey = @IntParam;
select * from dbo.Data where VarcharKey = convert(varchar(10),@IntParam);
As you can see in Figure 2-14 , in the case of the integer parameter, SQL Server scans the clustered index, converting varchar to an integer for every row. In the second case, SQL Server converts the integer parameter to a varchar at the beginning and utilizes a much more efficient clustered index seek operation.
如图2-14所示,对于整数参数,SQL Server扫描聚簇索引,将varchar转换为每行的整数。 在第二种情况下,SQL Server在开始时将整数参数转换为varchar,并使用更高效的聚簇索引查找操作。
图2-14。 SARG谓词和数据类型:带整数参数的执行计划
■ Tip Pay attention to the column data types in the join predicates. Implicit or explicit data type conversions can significantly decrease the performance of the queries.
■提示请注意连接谓词中的列数据类型。 隐式或显式数据类型转换
可以显着降低查询的性能。
You will observe very similar behavior in the case of unicode string parameters. Let’s run the queries shown in Listing 2-8 . Figure 2-15 shows the execution plans for the statements.
在unicode字符串参数的情况下,您将观察到非常类似的行为。 我们来运行查询如清单2-8所示。 图2-15显示了语句的执行计划。
Listing 2-8. SARG predicates and data types: Select with string parameter
清单2-8 SARG谓词和数据类型:使用字符串参数选择
select * from dbo.Data where VarcharKey = ‘200‘;
select * from dbo.Data where VarcharKey = N‘200‘; -- unicode parameter
图2-15 SARG谓词和数据类型:带字符串参数的执行计划
As you can see, a unicode string parameter is non-SARGable for varchar columns. This is a much bigger issue than it appears to be. While you rarely write queries in this way, as shown in Listing 2-8 , most application development environments nowadays treat strings as unicode. As a result, SQL Server client libraries generate unicode ( nvarchar ) parameters for string objects unless the parameter data type is explicitly specified as varchar . This makes the predicates non-SARGable, and it can lead to major performance hits due to unnecessary scans, even when varchar columns are indexed.
如您所见,对于varchar列,unicode字符串参数是非SARGable。 这是一个比看起来更大的问题。 虽然您很少以这种方式编写查询,如清单2-8所示,但现在大多数应用程序开发环境都将字符串视为unicode。 因此,除非将参数数据类型显式指定为varchar,否则SQL Server客户端库会为字符串对象生成unicode(nvarchar)参数。 这使得谓词不具有SARG,并且由于不必要的扫描,它可能导致主要的性能命中,即使对varchar列进行索引也是如此。
■ Important Always specify parameter data types in client applications. For example, in ADO.Net, use Parameters.Add("@ParamName",SqlDbType.Varchar, <Size>).Value = stringVariable instead of Parameters.Add("@ParamName").Value = stringVariable overload. Use mapping in ORM frameworks to explicitly specify non-unicode attributes in the classes.
■重要的始终是在客户端应用程序中指定参数数据类型 例如,在ADO.Net中,使用Parameters.Add(“@ ParamName”,SqlDbType.Varchar,<Size>)。Value = stringVariable而不是Parameters.Add(“@ ParamName”)。Value = stringVariable overload。 在ORM框架中使用映射来显式指定类中的非unicode属性。
It is also worth mentioning that varchar parameters are SARGable for nvarchar unicode data columns.
值得一提的是,对于nvarchar unicode数据列,varchar参数是SARGable。
以上是关于第十三周翻译的主要内容,如果未能解决你的问题,请参考以下文章