SQL Server 在列中搜索文本

Posted

技术标签:

【中文标题】SQL Server 在列中搜索文本【英文标题】:SQL Server searching for text in a column 【发布时间】:2013-08-05 03:24:24 【问题描述】:

我对使用什么感到困惑?

基本上我需要一个搜索字符串,可以在单个列中搜索多个短语的出现,每个输入短语用空格分隔。

所以用户的输入是这样的:

"Phrase1 Phrase2 ... PhraseX"     (number of phrases can 0 to unknown!, but say < 6)

我需要用逻辑搜索:

Where 'Phrase1%' **AND** 'Phrase2%' **AND** ... 'PhraseX%'

.. 等等...所以需要找到所有短语。

总是逻辑与

考虑到速度,性能,我是否使用:

很多

Like 'Phrase1%' and like 'Phrase2%' and like ... 'PhraseX%' ?

或使用

patindex('Phrase1%' , column) > 0 AND  patindex('Phrase2%' , column) > 0 
AND ...     patindex('PhraseX%' , column) 

或使用

添加全文搜索索引,

用途:

Where Contatins(Column, 'Phrase1*') AND Contatins(Column, 'Phrase2*') AND ... Contatins(Column, 'PhraseX*')

或者

????

几乎太多的选择,这就是为什么我要问,这样做最有效的方法是什么......

感谢您的智慧......

【问题讨论】:

你确定不是OR 吗?因为这不会返回任何东西:Like 'Phrase1%' and like 'Phrase2%'。顺便说一句,这种LIKE 开头没有% 将受益于索引 绝对是一个 And,它必须匹配所有的单词......我已经选择了 FTS 解决方案,使用“短语*”AND ... etc'...运行良好..谢谢.. 【参考方案1】:

如果您正在搜索 AND,那么正确的通配符搜索应该是:

Like '%Phrase1%' and like '%Phrase2%' and like ... '%PhraseX%' 

这里没有理由使用patindex(),因为like已经足够了,并且优化得很好。优化得很好,但是这种情况不能提高效率。这将需要全表扫描。而且,如果文本字段非常非常大(我的意思是至少有数千或数万个字符),性能将不会很好。

解决方案是全文搜索。你可以这样表述:

where CONTAINS(column, 'Phrase1 AND phrase2 AND . . . ');

这里唯一的问题是当您正在寻找的“短语”(似乎是单词)是停用词时。

总之,如果您的行数超过几千行,或者您正在搜索的文本字段的字符数超过几千,那么请使用全文选项。这只是为了指导。如果您正在搜索包含 100 行的引用表并查看最多包含 100 个字符的描述字段,那么 like 方法应该没问题。

【讨论】:

谢谢,这似乎有道理,我现在开始实施该解决方案。请问如果其中一个词是停用词会发生什么?什么是停用词? @大卫。 . .文本引擎会忽略停用词(它们通常是诸如“the”和“another”之类的词)。我相信在查询字符串和文本中都会忽略停用词,因此将返回包含所有其他单词的文档。【参考方案2】:

我个人喜欢这个解决方案 -

DECLARE @temp TABLE (title NVARCHAR(50))
INSERT INTO @temp (title)
VALUES ('Phrase1 33'), ('test Phrase2'), ('blank')

SELECT t.*
FROM @temp t
WHERE EXISTS(
    SELECT 1
    FROM (
        VALUES ('Phrase1'), ('Phrase2'), ('PhraseX')
    ) c(t)  
    WHERE title LIKE '%' + t + '%'
)

【讨论】:

聪明...但是对于未知数量的短语如何工作,我将如何填充值列表。也许是一个嵌套的选择......嗯'要考虑的事情......【参考方案3】:

理想情况下,这应该在上面提到的全文搜索的帮助下完成。 但, 如果您没有为您的数据库配置全文,这里是一个执行优先字符串搜索的性能密集型解决方案。 注意:这将返回输入单词的部分/完整组合的行(包含搜索字符串的一个或多个单词的行,以任何顺序):-

-- table to search in
drop table if exists dbo.myTable;
go
CREATE TABLE dbo.myTable
    (
    myTableId int NOT NULL IDENTITY (1, 1),
    code varchar(200) NOT NULL, 
    description varchar(200) NOT NULL -- this column contains the values we are going to search in 
    )  ON [PRIMARY]
GO

-- function to split space separated search string into individual words
drop function if exists [dbo].[fnSplit];
go
CREATE FUNCTION [dbo].[fnSplit] (@StringInput nvarchar(max),
@Delimiter nvarchar(1))
RETURNS @OutputTable TABLE (
  id nvarchar(1000)
)
AS
BEGIN
  DECLARE @String nvarchar(100);

  WHILE LEN(@StringInput) > 0
  BEGIN
    SET @String = LEFT(@StringInput, ISNULL(NULLIF(CHARINDEX(@Delimiter, @StringInput) - 1, -1),
    LEN(@StringInput)));
    SET @StringInput = SUBSTRING(@StringInput, ISNULL(NULLIF(CHARINDEX
    (
    @Delimiter, @StringInput
    ),
    0
    ), LEN
    (
    @StringInput)
    )
    + 1, LEN(@StringInput));

    INSERT INTO @OutputTable (id)
      VALUES (@String);
  END;

  RETURN;
END;
GO

-- this is the search script which can be optionally converted to a stored procedure /function

declare @search varchar(max) = 'infection upper acute genito'; -- enter your search string here
-- the searched string above should give rows containing the following
-- infection in upper side with acute genitointestinal tract
-- acute infection in upper teeth
-- acute genitointestinal pain

if (len(trim(@search)) = 0) -- if search string is empty, just return records ordered alphabetically
begin
 select 1 as Priority ,myTableid, code, Description from myTable order by Description 
 return;
end

declare @splitTable Table(
wordRank int Identity(1,1), -- individual words are assinged priority order (in order of occurence/position)
word varchar(200)
)
declare @nonWordTable Table( -- table to trim out auxiliary verbs, prepositions etc. from the search
id varchar(200)
)

insert into @nonWordTable values
('of'),
('with'),
('at'),
('in'),
('for'),
('on'),
('by'),
('like'),
('up'),
('off'),
('near'),
('is'),
('are'),
(','),
(':'),
(';')

insert into @splitTable
select id from dbo.fnSplit(@search,' '); -- this function gives you a table with rows containing all the space separated words of the search like in this e.g., the output will be -
--  id
-------------
-- infection
-- upper
-- acute
-- genito

delete s from @splitTable s join @nonWordTable n  on s.word = n.id; -- trimming out non-words here
declare @countOfSearchStrings int = (select count(word) from @splitTable);  -- count of space separated words for search
declare @highestPriority int = POWER(@countOfSearchStrings,3);

with plainMatches as
(
select myTableid, @highestPriority as Priority from myTable where Description like @search  -- exact matches have highest priority
union                                      
select myTableid, @highestPriority-1 as Priority from myTable where Description like  @search + '%'  -- then with something at the end
union                                      
select myTableid, @highestPriority-2 as Priority from myTable where Description like '%' + @search -- then with something at the beginning
union                                      
select myTableid, @highestPriority-3 as Priority from myTable where Description like '%' + @search + '%' -- then if the word falls somewhere in between
),
splitWordMatches as( -- give each searched word a rank based on its position in the searched string
                     -- and calculate its char index in the field to search
select myTable.myTableid, (@countOfSearchStrings - s.wordRank) as Priority, s.word,
wordIndex = CHARINDEX(s.word, myTable.Description)  from myTable join @splitTable s on myTable.Description like '%'+ s.word + '%'
-- and not exists(select myTableid from plainMatches p where p.myTableId = myTable.myTableId) -- need not look into rows that have already been found in plainmatches as they are highest ranked
                                                                              -- this one takes a long time though, so commenting it, will have no impact on the result
),
wordIndexRatings as( -- reverse the char indexes retrived above so that words occuring earlier have higher weightage
                     -- and then normalize them to sequential values
select myTableid, Priority, word, ROW_NUMBER() over (partition by myTableid order by wordindex desc) as comparativeWordIndex 
from splitWordMatches 
)
,
wordIndexSequenceRatings as ( -- need to do this to ensure that if the same set of words from search string is found in two rows,
                              -- their sequence in the field value is taken into account for higher priority
    select w.myTableid, w.word, (w.Priority + w.comparativeWordIndex + coalesce(sequncedPriority ,0)) as Priority
    from wordIndexRatings w left join 
    (
     select w1.myTableid, w1.priority, w1.word, w1.comparativeWordIndex, count(w1.myTableid) as sequncedPriority
     from wordIndexRatings w1 join wordIndexRatings w2 on w1.myTableId = w2.myTableId and w1.Priority > w2.Priority and w1.comparativeWordIndex>w2.comparativeWordIndex
     group by w1.myTableid, w1.priority,w1.word, w1.comparativeWordIndex
    ) 
    sequencedPriority on w.myTableId = sequencedPriority.myTableId and w.Priority = sequencedPriority.Priority
),
prioritizedSplitWordMatches as ( -- this calculates the cumulative priority for a field value
select  w1.myTableId, sum(w1.Priority) as OverallPriority from wordIndexSequenceRatings w1 join wordIndexSequenceRatings w2 on w1.myTableId =  w2.myTableId 
where w1.word <> w2.word group by w1.myTableid 
),
completeSet as (
select myTableid, priority from plainMatches -- get plain matches which should be highest ranked
union
select myTableid, OverallPriority as priority from prioritizedSplitWordMatches -- get ranked split word matches (which are ordered based on word rank in search string and sequence)
union
select myTableid, Priority as Priority from splitWordMatches -- get one word matches
),
maximizedCompleteSet as( -- set the priority of a field value = maximum priority for that field value
select myTableid, max(priority) as Priority from completeSet group by myTableId
)
select priority, myTable.myTableid , code, Description from maximizedCompleteSet m join myTable  on m.myTableId = myTable.myTableId 
order by Priority desc, Description -- order by priority desc to get highest rated items on top
--offset 0 rows fetch next 50 rows only -- optional paging

【讨论】:

以上是关于SQL Server 在列中搜索文本的主要内容,如果未能解决你的问题,请参考以下文章

我希望搜索结果出现在列中,但只有文本出现

如何使用 SQL LIKE 运算符在列中搜索精确模式?

Excel VBA - 搜索范围和连接的 SQL ADODB 记录集以在列中匹配写入结果集

如何使用 LIKE 通配符在列中搜索(不区分大小写)?

Excel搜索Word是否在列A中的某个位置以及列B中

在多个列中搜索