Fulltext Index Study4:management and performance

Posted 悦光阴

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Fulltext Index Study4:management and performance相关的知识,希望对你有一定的参考价值。

Only one full-text index is allowed per table. For a full-text index to be created on a table, the table must have a single, unique nonnull column. You can build a full-text index on columns of type char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, and varbinary(max) can be indexed for full-text search. Creating a full-text index on a column whose data type is  varbinary, varbinary(max), image, or xml requires that you specify a type column. A type column is a table column in which you store the file extension (.doc, .pdf, .xls, and so forth) of the document in each row.

The process of creating and maintaining a full-text index is called a population (also known as a crawl). There are three types of full-text index population: full population, change tracking-based population, and incremental timestamp-based population. For more information, see Populate Full-Text Indexes.

1,Full-Text Index Structure

Fulltext Index会将字符列拆分成多个substring,每个substring 是一个word,如果substring是stopwords,那么该substring不会存储到fulltext index,但是stopwords的position会被考虑,substring的position是substring在列中的绝对位置。substring存在重复值,Full-Text Index Structure 使用四个column:substring,ColumnID,KeyID,Position ,timestamp来唯一标识一个column。在内部,fulltext 是由多个fragments构成的,如果一个column value 发生change,那么sql server 不会更新之前的fragments,而是在crawl时,创建一个新的fragment,timestamp 会标识column value 有变化,fulltext index会选择最新的fragments来返回数据。

如果表的数据行多,列值拆分的substring 多,那么Fulltext Index将会非常庞大。

引用doc:Create and Manage Full-Text Indexes

A good understanding of the structure of a full-text index will help you understand how the Full-Text Engine works. This topic uses the following excerpt of the Document table in Adventure Works as an example table. This excerpt shows only two columns, the DocumentID column and the Title column, and three rows from the table.

For this example, we will assume that a full-text index has been created on the Title column.

DocumentID

Title

1

Crank Arm and Tire Maintenance

2

Front Reflector Bracket and Reflector Assembly 3

3

Front Reflector Bracket Installation

For example, the following table, which shows Fragment 1, depicts the contents of the full-text index created on the Title column of the Document table. Full-text indexes contain more information than is presented in this table. The table is a logical representation of a full-text index and is provided for demonstration purposes only. The rows are stored in a compressed format to optimize disk usage.

Notice that the data has been inverted from the original documents. Inversion occurs because the keywords are mapped to the document IDs. For this reason, a full-text index is often referred to as an inverted index.

Also notice that the keyword "and" has been removed from the full-text index. This is done because "and" is a stopword, and removing stopwords from a full-text index can lead to substantial savings in disk space thereby improving query performance. For more information about stopwords, see Configure and Manage Stopwords and Stoplists for Full-Text Search.

Fragment 1

Keyword

ColId

DocId

Occurrence

Crank

1

1

1

Arm

1

1

2

Tire

1

1

4

Maintenance

1

1

5

Front

1

2

1

Front

1

3

1

Reflector

1

2

2

Reflector

1

2

5

Reflector

1

3

2

Bracket

1

2

3

Bracket

1

3

3

Assembly

1

2

6

3

1

2

7

Installation

1

3

4

The Keyword column contains a representation of a single token extracted at indexing time. Word breakers determine what makes up a token.

The ColId column contains a value that corresponds to a particular column that is full-text indexed.

The DocId column contains values for an eight-byte integer that maps to a particular full-text key value in a full-text indexed table. This mapping is necessary when the full-text key is not an integer data type. In such cases, mappings between full-text key values and DocId values are maintained in a separate table called the DocId Mapping table. To query for these mappings use the sp_fulltext_keymappings system stored procedure. To satisfy a search condition, DocId values from the above table need to be joined with the DocId Mapping table to retrieve rows from the base table being queried. If the full-text key value of the base table is an integer type, the value directly serves as the DocId and no mapping is necessary. Therefore, using integer full-text key values can help optimize full-text queries.

The Occurrence column contains an integer value. For each DocId value, there is a list of occurrence values that correspond to the relative word offsets of the particular keyword within that DocId. Occurrence values are useful in determining phrase or proximity matches, for example, phrases have numerically adjacent occurrence values. They are also useful in computing relevance scores; for example, the number of occurrences of a keyword in a DocId may be used in scoring.

 

2,Full-Text Index Fragments  

The logical full-text index is usually split across multiple internal tables. Each internal table is called a full-text index fragment. Some of these fragments might contain newer data than others. For example, if a user updates the following row whose DocId is 3 and the table is auto change-tracked, a new fragment is created.

DocumentID

Title

3

Rear Reflector

In the following example, which shows Fragment 2, the fragment contains newer data about DocId 3 compared to Fragment 1. Therefore, when the user queries for "Rear Reflector" the data from Fragment 2 is used for DocId 3. Each fragment is marked with a creation timestamp that can be queried by using the sys.fulltext_index_fragments catalog view.

Fragment 2        

Keyword

ColId

DocId

Occ

Rear

1

3

1

Reflector

1

3

2

As can be seen from Fragment 2, full-text queries need to query each fragment internally and discard older entries. Therefore, too many full-text index fragments in the full-text index can lead to substantial degradation in query performance. To reduce the number of fragments, reorganize the fulltext catalog by using the REORGANIZE option of the ALTER FULLTEXT CATALOG Transact-SQL statement. This statement performs a master merge, which merges the fragments into a single larger fragment and removes all obsolete entries from the full-text index.

After being reorganized, the example index would contain the following rows:

Keyword

ColId

DocId

Occ

Crank

1

1

1

Arm

1

1

2

Tire

1

1

4

Maintenance

1

1

5

Front

1

2

1

Rear

1

3

1

Reflector

1

2

2

Reflector

1

2

5

Reflector

1

3

2

Bracket

1

2

3

Assembly

1

2

6

3

1

2

7

3,configure stopwords

参考doc:Configure and Manage Stopwords and Stoplists for Full-Text Search

To prevent a full-text index from becoming bloated, SQL Server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings are called stopwords. During index creation, the Full-Text Engine omits stopwords from the full-text index. This means that full-text queries will not search on stopwords.

Although it ignores the inclusion of stopwords, the full-text index does take into account their position.

通过 CREATE FULLTEXT STOPLIST (Transact-SQL)  和 DROP FULLTEXT STOPLIST (Transact-SQL) 创建和删除 StopLists,通过ALTER FULLTEXT STOPLIST (Transact-SQL) 增加和删除 StopList的Stopwords.

ALTER FULLTEXT STOPLIST stoplist_name
{ 
   ADD [N] stopword LANGUAGE language_term  
   | DROP 
    {
        stopword LANGUAGE language_term 
      | ALL LANGUAGE language_term 
      | ALL
     }
};

通过  sys.fulltext_stoplists (Transact-SQL) 和 sys.fulltext_stopwords (Transact-SQL) 查看系统中已经存在的StopLists 和 StopWords。

3,维护Fulltext catalog

由于一个fulltext index 可能存在多个 fragments,当数据更新时,新的fragments 会被创建,但是旧的fragments 不会被删除,这样会导致fragments的增加,性能下降。由于每一个Fulltext index 都属于一个catalog,通过对catalog 进行 rebuild 或reorganize,可以重新创建会组织fulltext index 的结构,提高查询性能。

ALTER FULLTEXT CATALOG catalog_name 
{ REBUILD [ WITH ACCENT_SENSITIVITY = { ON | OFF } ]
| REORGANIZE
| AS DEFAULT 
}

REBUILD               

Tells SQL Server to rebuild the entire catalog. When a catalog is rebuilt, the existing catalog is deleted and a new catalog is created in its place. All the tables that have full-text indexing references are associated with the new catalog. Rebuilding resets the full-text metadata in the database system tables. 

REORGANIZE               

Tells SQL Server to perform a master merge, which involves merging the smaller indexes created in the process of indexing into one large index. Merging the full-text index fragments can improve performance and free up disk and memory resources. If there are frequent changes to the full-text catalog, use this command periodically to reorganize the full-text catalog.

REORGANIZE also optimizes internal index and catalog structures.

Keep in mind that, depending on the amount of indexed data, a master merge may take some time to complete. Master merging a large amount of data can create a long running transaction, delaying truncation of the transaction log during checkpoint. In this case, the transaction log might grow significantly under the full recovery model. As a best practice, ensure that your transaction log contains sufficient space for a long-running transaction before reorganizing a large full-text index in a database that uses the full recovery model. For more information, see Manage the Size of the Transaction Log File.

参考doc:ALTER FULLTEXT CATALOG (Transact-SQL)

 

参考doc:

Create and Manage Full-Text Indexes

Manage Full-Text Indexes

Improve the Performance of Full-Text Indexes

以上是关于Fulltext Index Study4:management and performance的主要内容,如果未能解决你的问题,请参考以下文章

FULLTEXT INDEX全文索引

Fulltext Index Study8:Resouce Consumption

Fulltext Index Study2:Pupulate

FullText Index5: fundamental component

Fulltext Index Study1:Usage

Fulltext Index Study6:Population monitor