SQL提取反斜杠之间的数字

Posted

技术标签:

【中文标题】SQL提取反斜杠之间的数字【英文标题】:TSQL Extracting Numbers between backslashes 【发布时间】:2017-09-07 08:15:53 【问题描述】:

我正在尝试提取出现在同一位置但长度可能不同的第一批数字。 SUBSTRING、CHARINDEX、PATINDEX、REVERSE 尝试过不同的方法,但还是破解不了。

这里是字符串的格式

\zfilemgr3-00\Corporate\On the Market Information\13030\12743\Contract\12743.pdf

\zfilemgr3-00\Corporate\On the Market Information\141590\Contract\141590.pdf

所以这两个的结果是 13030 141590

【问题讨论】:

我认为更好的方法是使用正则表达式的 CLR 函数 【参考方案1】:

考虑appear in the same place

declare @temp table (val varchar(250));

insert @temp values
('\zfilemgr3-00\Corporate\On the Market Information\13030\12743\Contract\12743.pdf'),
('\zfilemgr3-00\Corporate\On the Market Information\141590\Contract\141590.pdf');

declare @start int = 51;

select substring(val, @start, charindex('\', val, @start) - @start) num
    from @temp;

输出:

num
------
13030
141590

此代码适合您吗?

【讨论】:

当问题中提到的字符串长度可以变化时,为什么要将 @start 声明为 51【参考方案2】:

你尝试过强制转换方法吗?

SELECT CAST('<x>' + REPLACE('\zfilemgr3-00\Corporate\On the Market Information\13030\12743\Contract\12743.pdf','\','</x><x>') + '</x>' AS XML).value('/x[5]','int');
SELECT CAST('<x>' + REPLACE('\zfilemgr3-00\Corporate\On the Market Information\141590\Contract\141590.pdf','\','</x><x>') + '</x>' AS XML).value('/x[5]','int');

在snippet查看直播

【讨论】:

【参考方案3】:

你可以试试这个:

DECLARE @value NVARCHAR(4000) = N'\zfilemgr3-00\Corporate\On the Market Information\13030\12743\Contract\12743.pdf'


-- 12743.pdf
SELECT REVERSE(SUBSTRING(REVERSE(@value), 0, CHARINDEX('\', REVERSE(@value))))      

-- 12743
SELECT REVERSE(SUBSTRING(REVERSE(@value), PATINDEX('%[0-9]%', REVERSE(@value)), CHARINDEX('\', REVERSE(@value)) - PATINDEX('%[0-9]%', REVERSE(@value))))

如果您一直在使用.pdf 或某些扩展,您可以使用第一个变体并替换扩展。如果您需要即时执行此操作,我们需要获取第一个数字,以便将其用作SUBSTRING 函数中的起始索引。

【讨论】:

【参考方案4】:

我使用NGrams8K 为这类事情开发了一个非常快速的 T-SQL 函数。

函数

CREATE FUNCTION dbo.SubstringBetweenChar8K
(
  @string    varchar(8000),
  @start     tinyint,
  @stop      tinyint,
  @delimiter char(1)
)
/*****************************************************************************************
Purpose:
 Takes in input string (@string) and returns the text between two instances of a delimiter
 (@delimiter); the location of the delimiters is defined by @start and @stop.
 For example: if @string = 'xx.yy.zz.abc', @start=1, @stop=3, and @delimiter = '.' the
 function will return the text: yy.zz; this is the text between the first and third
 instance of "." in the string "xx.yy.zz.abc".

Compatibility:
 SQL Server 2008+

Syntax:
--===== Autonomous use
 SELECT sb.token, sb.position, sb.tokenLength
 FROM dbo.SubstringBetweenChar8K(@string, @start, @stop, @delimiter); sb;

--===== Use against a table
 SELECT sb.token, sb.position, sb.tokenLength
 FROM SomeTable st
 CROSS APPLY dbo.SubstringBetweenChar8K(st.SomeColumn1, 1, 2, '.') sb;

Parameters:
 @string    = varchar(8000); Input string to parse
 @start     = tinyint; the instance of @delimiter to search for; this is where the output 
              should start. When @start is 0 then the function will return everything from
              the beginning of @string until @end.
 @stop      = tinyint; the last instance of @delimiter to search for; this is where the
              output should end. When @end is 0 then the function will return everything
              from @start until the end of the string.
 @delimiter = char(1); this is the delimiter use to determine where the output starts/ends

Return Types:
 Inline Table Valued Function returns:
   token     = varchar(8000); the substring between the two instances of @delimiter 
               defined by @start and @stop
 position    = smallint; the location of where the substring begins
------------------------------------------------------------------------------------------
Developer Notes:
 1. Requires NGrams8K. The code for NGrams8K can be found here:
    http://www.sqlservercentral.com/articles/Tally+Table/142316/

 2. This function is what is referred to as an "inline" scalar UDF." Technically it's an
    inline table valued function (iTVF) but performs the same task as a scalar valued user
    defined function (UDF); the difference is that it requires the APPLY table operator
    to accept column values as a parameter. For more about "inline" scalar UDFs see this
    article by SQL MVP Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/91724/
    and for more about how to use APPLY see the this article by SQL MVP Paul White:
    http://www.sqlservercentral.com/articles/APPLY/69953/.

    Note the above syntax example and usage examples below to better understand how to
    use the function. Although the function is slightly more complicated to use than a
    scalar UDF it will yield notably better performance for many reasons. For example,
    unlike a scalar UDFs or multi-line table valued functions, the inline scalar UDF does
    not restrict the query optimizer's ability generate a parallel query execution plan.

 3. dbo.SubstringBetweenChar8K is deterministic; for more about deterministic and
    nondeterministic functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx

Examples:
-- beginning of string to 2nd delimiter, 2nd delimiter to end of the string
DECLARE @string varchar(100) = 'abc.defg.hi.jk.lmnop.qrs.tuv';
SELECT string=@string, token, position FROM dbo.SubstringBetweenChar8K(@string,0,2, '.');
SELECT string=@string, token, position FROM dbo.SubstringBetweenChar8K(@string,2,0, '.');

-- Between the 1st & 2nd, then 2nd & 5th delimiters
SELECT string=@string, token, position FROM dbo.SubstringBetweenChar8K(@string,1,2, '.');
SELECT string=@string, token, position FROM dbo.SubstringBetweenChar8K(@string,2,5, '.');

-- dealing with NULLS, delimiters that don't exist and when @first = @last
SELECT string=@string, token, position FROM dbo.SubstringBetweenChar8K(@string,2,10,'.');
SELECT string=@string, token, position FROM dbo.SubstringBetweenChar8K(@string,1,NULL,'.');
SELECT string=@string, token, position FROM dbo.SubstringBetweenChar8K(@string,NULL,1,'.');
---------------------------------------------------------------------------------------
Revision History:
 Rev 00 - 20160720 - Initial Creation - Alan Burstein
 Rev 01 - 20160821 - Re-wrote a single-char version (this); removed tokenLen
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH
chars AS
(
 SELECT instance = 0, position = 0 WHERE @start = 0
 UNION ALL
 SELECT ROW_NUMBER() OVER (ORDER BY position), position
 FROM dbo.NGrams8k(@string,1)
 WHERE token = @delimiter
 UNION ALL
 SELECT -1, DATALENGTH(@string)+1 WHERE @stop = 0
)
SELECT 
  token = SUBSTRING
          (
            @string,
            MIN(position)+1,
            NULLIF(MAX(position),MIN(position)) - MIN(position)-1
          ),
  position = CAST
            (
            CASE WHEN NULLIF(MAX(position),MIN(position)) - MIN(position)-1 > 0
            THEN MIN(position)+1 END AS smallint
          )
FROM chars
WHERE instance IN (@start, NULLIF(@stop,0), -1);
GO

使用您的数据的示例:

declare @sometable table (someid int identity, someString varchar(1000));
insert @sometable(someString) values
('\zfilemgr3-00\Corporate\On the Market Information\13030\12743\Contract\12743.pdf'),
('\zfilemgr3-00\Corporate\On the Market Information\141590\Contract\141590.pdf');

select *
from @sometable s
cross apply dbo.SubstringBetweenChar8K(s.someString, 4, 5, '\');

结果:

someid      someString                                                                          token   position
----------- ----------------------------------------------------------------------------------- ------- --------
1           \zfilemgr3-00\Corporate\On the Market Information\13030\12743\Contract\12743.pdf    13030   51
2           \zfilemgr3-00\Corporate\On the Market Information\141590\Contract\141590.pdf        141590  51

【讨论】:

以上是关于SQL提取反斜杠之间的数字的主要内容,如果未能解决你的问题,请参考以下文章

文件路径中正斜杠 (/) 和反斜杠 (\) 之间的区别

为啥SQL支持两个单引号以及反斜杠转义

SQL注入防御绕过——二次编码之干掉反斜杠

Environment.SystemDirectory获得的路径没有反斜杠

日期文本输入的自动反斜杠反应本机?

为啥我需要双转义(使用 4 \)才能在纯 SQL 中找到反斜杠( \ )?