Fulltext Index Study4:management and performance
Posted 悦光阴
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Fulltext Index Study4:management and performance相关的知识,希望对你有一定的参考价值。
Only one full-text index is allowed per table. For a full-text index to be created on a table, the table must have a single, unique nonnull column. You can build a full-text index on columns of type char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, and varbinary(max) can be indexed for full-text search. Creating a full-text index on a column whose data type is varbinary, varbinary(max), image, or xml requires that you specify a type column. A type column is a table column in which you store the file extension (.doc, .pdf, .xls, and so forth) of the document in each row.
The process of creating and maintaining a full-text index is called a population (also known as a crawl). There are three types of full-text index population: full population, change tracking-based population, and incremental timestamp-based population. For more information, see Populate Full-Text Indexes.
1,
引用doc:Create and Manage Full-Text Indexes
For this example, we will assume that a full-text index has been created on the Title column.
DocumentID |
Title |
---|---|
1 |
Crank Arm and Tire Maintenance |
2 |
Front Reflector Bracket and Reflector Assembly 3 |
3 |
Front Reflector Bracket Installation |
For example, the following table, which shows Fragment 1, depicts the contents of the full-text index created on the Title column of the Document table. Full-text indexes contain more information than is presented in this table. The table is a logical representation of a full-text index and is provided for demonstration purposes only. The rows are stored in a compressed format to optimize disk usage.
Notice that the data has been inverted from the original documents. Inversion occurs because the keywords are mapped to the document IDs. For this reason, a full-text index is often referred to as an inverted index.
Also notice that the keyword "and" has been removed from the full-text index. This is done because "and" is a stopword, and removing stopwords from a full-text index can lead to substantial savings in disk space thereby improving query performance. For more information about stopwords, see Configure and Manage Stopwords and Stoplists for Full-Text Search.
Fragment 1
Keyword |
ColId |
DocId |
Occurrence |
---|---|---|---|
Crank |
1 |
1 |
1 |
Arm |
1 |
1 |
2 |
Tire |
1 |
1 |
4 |
Maintenance |
1 |
1 |
5 |
Front |
1 |
2 |
1 |
Front |
1 |
3 |
1 |
Reflector |
1 |
2 |
2 |
Reflector |
1 |
2 |
5 |
Reflector |
1 |
3 |
2 |
Bracket |
1 |
2 |
3 |
Bracket |
1 |
3 |
3 |
Assembly |
1 |
2 |
6 |
3 |
1 |
2 |
7 |
Installation |
1 |
3 |
4 |
The Keyword column contains a representation of a single token extracted at indexing time. Word breakers determine what makes up a token.
The ColId column contains a value that corresponds to a particular column that is full-text indexed.
The DocId column contains values for an eight-byte integer that maps to a particular full-text key value in a full-text indexed table. This mapping is necessary when the full-text key is not an integer data type. In such cases, mappings between full-text key values and DocId values are maintained in a separate table called the DocId Mapping table. To query for these mappings use the sp_fulltext_keymappings system stored procedure. To satisfy a search condition, DocId values from the above table need to be joined with the DocId Mapping table to retrieve rows from the base table being queried. If the full-text key value of the base table is an integer type, the value directly serves as the DocId and no mapping is necessary. Therefore, using integer full-text key values can help optimize full-text queries.
The Occurrence column contains an integer value. For each DocId value, there is a list of occurrence values that correspond to the relative word offsets of the particular keyword within that DocId. Occurrence values are useful in determining phrase or proximity matches, for example, phrases have numerically adjacent occurrence values. They are also useful in computing relevance scores; for example, the number of occurrences of a keyword in a DocId may be used in scoring.
The logical full-text index is usually split across multiple internal tables. Each internal table is called a full-text index fragment. Some of these fragments might contain newer data than others. For example, if a user updates the following row whose DocId is 3 and the table is auto change-tracked, a new fragment is created.
DocumentID |
Title |
---|---|
3 |
Rear Reflector |
In the following example, which shows Fragment 2, the fragment contains newer data about DocId 3 compared to Fragment 1. Therefore, when the user queries for "Rear Reflector" the data from Fragment 2 is used for DocId 3. Each fragment is marked with a creation timestamp that can be queried by using the sys.fulltext_index_fragments catalog view.
Fragment 2
Keyword |
ColId |
DocId |
Occ |
---|---|---|---|
Rear |
1 |
3 |
1 |
Reflector |
1 |
3 |
2 |
As can be seen from Fragment 2, full-text queries need to query each fragment internally and discard older entries. Therefore, too many full-text index fragments in the full-text index can lead to substantial degradation in query performance. To reduce the number of fragments, reorganize the fulltext catalog by using the REORGANIZE option of the ALTER FULLTEXT CATALOG Transact-SQL statement. This statement performs a master merge, which merges the fragments into a single larger fragment and removes all obsolete entries from the full-text index.
After being reorganized, the example index would contain the following rows:
Keyword |
ColId |
DocId |
Occ |
---|---|---|---|
Crank |
1 |
1 |
1 |
Arm |
1 |
1 |
2 |
Tire |
1 |
1 |
4 |
Maintenance |
1 |
1 |
5 |
Front |
1 |
2 |
1 |
Rear |
1 |
3 |
1 |
Reflector |
1 |
2 |
2 |
Reflector |
1 |
2 |
5 |
Reflector |
1 |
3 |
2 |
Bracket |
1 |
2 |
3 |
Assembly |
1 |
2 |
6 |
3 |
1 |
2 |
7 |
3,configure stopwords
参考doc:Configure and Manage Stopwords and Stoplists for Full-Text Search
To prevent a full-text index from becoming bloated, SQL Server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings are called stopwords. During index creation, the Full-Text Engine omits stopwords from the full-text index. This means that full-text queries will not search on stopwords.
Although it ignores the inclusion of stopwords, the full-text index does take into account their position.
通过 CREATE FULLTEXT STOPLIST (Transact-SQL) 和 DROP FULLTEXT STOPLIST (Transact-SQL) 创建和删除 StopLists,通过ALTER FULLTEXT STOPLIST (Transact-SQL) 增加和删除 StopList的Stopwords.
ALTER FULLTEXT STOPLIST stoplist_name { ADD [N] ‘stopword‘ LANGUAGE language_term | DROP { ‘stopword‘ LANGUAGE language_term | ALL LANGUAGE language_term | ALL } };
通过 sys.fulltext_stoplists (Transact-SQL) 和 sys.fulltext_stopwords (Transact-SQL) 查看系统中已经存在的StopLists 和 StopWords。
3,维护Fulltext catalog
由于一个fulltext index 可能存在多个 fragments,当数据更新时,新的fragments 会被创建,但是旧的fragments 不会被删除,这样会导致fragments的增加,性能下降。由于每一个Fulltext index 都属于一个catalog,通过对catalog 进行 rebuild 或reorganize,可以重新创建会组织fulltext index 的结构,提高查询性能。
ALTER FULLTEXT CATALOG catalog_name { REBUILD [ WITH ACCENT_SENSITIVITY = { ON | OFF } ] | REORGANIZE | AS DEFAULT }
REBUILD
Tells SQL Server to rebuild the entire catalog. When a catalog is rebuilt, the existing catalog is deleted and a new catalog is created in its place. All the tables that have full-text indexing references are associated with the new catalog. Rebuilding resets the full-text metadata in the database system tables.
REORGANIZE
Tells SQL Server to perform a master merge, which involves merging the smaller indexes created in the process of indexing into one large index. Merging the full-text index fragments can improve performance and free up disk and memory resources. If there are frequent changes to the full-text catalog, use this command periodically to reorganize the full-text catalog.
REORGANIZE also optimizes internal index and catalog structures.
Keep in mind that, depending on the amount of indexed data, a master merge may take some time to complete. Master merging a large amount of data can create a long running transaction, delaying truncation of the transaction log during checkpoint. In this case, the transaction log might grow significantly under the full recovery model. As a best practice, ensure that your transaction log contains sufficient space for a long-running transaction before reorganizing a large full-text index in a database that uses the full recovery model. For more information, see Manage the Size of the Transaction Log File.
参考doc:ALTER FULLTEXT CATALOG (Transact-SQL)
参考doc:
Create and Manage Full-Text Indexes
Improve the Performance of Full-Text Indexes
以上是关于Fulltext Index Study4:management and performance的主要内容,如果未能解决你的问题,请参考以下文章
Fulltext Index Study8:Resouce Consumption
Fulltext Index Study2:Pupulate