用于生成 slug 的 T-SQL 函数?

Posted

技术标签:

【中文标题】用于生成 slug 的 T-SQL 函数?【英文标题】:T-SQL function for generating slugs? 【发布时间】:2011-03-06 04:26:47 【问题描述】:

快速检查是否有人拥有或知道能够从给定的 nvarchar 输入生成 slug 的 T-SQL 函数。即;

“你好世界”>“你好世界” “这是一个测试”>“这是一个测试”

我有一个通常用于这些目的的 C# 函数,但在这种情况下,我有大量数据要解析并转换为 slug,因此在 SQL Server 上执行此操作更有意义,而不是必须通过网络传输数据。

顺便说一句,我没有远程桌面访问权限,因此我无法针对它运行代码(.net、Powershell 等)

提前致谢。

编辑: 根据要求,这是我通常用来生成 slug 的函数:

public static string GenerateSlug(string n, int maxLength)

    string s = n.ToLower();                
    s = Regex.Replace(s, @"[^a-z0-9s-]", "");              
    s = Regex.Replace(s, @"[s-]+", " ").Trim();             
    s = s.Substring(0, s.Length <= maxLength ? s.Length : maxLength).Trim();             
    s = Regex.Replace(s, @"s", "-"); 
    return s;

【问题讨论】:

你能发布你的 C# slug-generator 函数吗? 【参考方案1】:
To slug with Vietnamese unicode    

CREATE function [dbo].[toslug](@string nvarchar(4000)) 
    RETURNS varchar(4000) AS BEGIN 
    declare @out nvarchar(4000)
    declare @from nvarchar(255)
    declare @to varchar(255)
    --convert to ASCII dbo.slugify
    set @string = lower(@string)
    set @out = @string
    set @from = N'ýỳỷỹỵáàảãạâấầẩẫậăắằẳẵặéèẻẽẹêếềểễệúùủũụưứừửữựíìỉĩịóòỏõọơớờởỡợôốồổỗộđ·/_,:;'
    set @to = 'yyyyyaaaaaaaaaaaaaaaaaeeeeeeeeeeeuuuuuuuuuuuiiiiioooooooooooooooood------'
    declare @pi int 
    set @pi = 1
    --I'm sorry T-SQL have no regex. Thanks for patindex, MS .. :-)
    while @pi<=len(@from) begin
        set @out = replace(@out, substring(@from,@pi,1), substring(@to,@pi,1))
        set @pi = @pi + 1
    end
    set @out = ltrim(rtrim(@out))

   -- replace space to hyphen   
   set @out = replace(@out, ' ', '-')

   -- remove double hyphen
   while CHARINDEX('--', @out) > 0 set @out = replace(@out, '--', '-')

   return (@out)
END

【讨论】:

请看看你的帖子是如何呈现的,edit it to fix the formatting。您可以在后期编辑区域获得一个方便的小预览,以便在提交之前查看它的外观。 这个链接可能对你有用 - How to Answer【参考方案2】:

我知道这是一个旧线程,但是对于下一代,我发现了一个甚至可以处理重音符号的函数here:

CREATE function [dbo].[slugify](@string varchar(4000)) 
    RETURNS varchar(4000) AS BEGIN 
    declare @out varchar(4000)

    --convert to ASCII
    set @out = lower(@string COLLATE SQL_Latin1_General_CP1251_CS_AS)

    declare @pi int 
    --I'm sorry T-SQL have no regex. Thanks for patindex, MS .. :-)
    set @pi = patindex('%[^a-z0-9 -]%',@out)
    while @pi>0 begin
        set @out = replace(@out, substring(@out,@pi,1), '')
        --set @out = left(@out,@pi-1) + substring(@out,@pi+1,8000)
        set @pi = patindex('%[^a-z0-9 -]%',@out)
    end

    set @out = ltrim(rtrim(@out))

   -- replace space to hyphen   
   set @out = replace(@out, ' ', '-')

   -- remove double hyphen
   while CHARINDEX('--', @out) > 0 set @out = replace(@out, '--', '-')

   return (@out)
END

【讨论】:

使用 COLLATE SQL_Latin1_General_CP1251_CS_AS 去除重音仅适用于 varchar 变量。如果您尝试对 nvarchar 变量执行相同操作,则不会发生任何事情。如果输入是 nvarchar,则必须在某些时候使用 cast(@string as varchar) 将其显式转换为 varchar。如果你不这样做,口音将保持在原来的位置。 可能还需要使用具有特定长度的强制转换,因为如果在SELECT 语句中将其转换为varchar,SQL Server 似乎会缩短您的字符串。例如cast(@string as varchar(500))【参考方案3】:

这是我想出的解决方案。随意修复/修改需要的地方。

我应该提到我目前正在开发的数据库不区分大小写,因此是 LOWER(@str)。

CREATE FUNCTION [dbo].[UDF_GenerateSlug]
(   
    @str VARCHAR(100)
)
RETURNS VARCHAR(100)
AS
BEGIN
DECLARE @IncorrectCharLoc SMALLINT
SET @str = LOWER(@str)
SET @IncorrectCharLoc = PATINDEX('%[^0-9a-z ]%',@str)
WHILE @IncorrectCharLoc > 0
BEGIN
SET @str = STUFF(@str,@incorrectCharLoc,1,'')
SET @IncorrectCharLoc = PATINDEX('%[^0-9a-z ]%',@str)
END
SET @str = REPLACE(@str,' ','-')
RETURN @str
END

提及:http://blog.sqlauthority.com/2007/05/13/sql-server-udf-function-to-parse-alphanumeric-characters-from-string/ 为原始代码。

【讨论】:

不应该是'%[^0-9a-z] %',而应该是'%[^0-9a-z- ]%'【参考方案4】:
-- Converts a title such as "This is a Test" to an all lower case string such
-- as "this-is-a-test" for use as the slug in a URL.  All runs of separators
-- (whitespace, underscore, or hyphen) are converted to a single hyphen.
-- This is implemented as a state machine having the following four states:
--
--     0 - initial state
--     1 - in a sequence consisting of valid characters (a-z, A-Z, or 0-9)
--     2 - in a sequence of separators (whitespace, underscore, or hyphen)
--     3 - encountered a character that is neither valid nor a separator
--
-- Once the next state has been determined, the return value string is
-- built based on the transitions from the current state to the next state.
--
-- State 0 skips any initial whitespace.  State 1 includes all valid slug
-- characters.  State 2 converts multiple separators into a single hyphen
-- and skips trailing whitespace.  State 3 skips any punctuation between
-- between characters and, if no additional whitespace is encountered,
-- then the punctuation is not treated as a word separator.
--
CREATE FUNCTION ToSlug(@title AS NVARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
    DECLARE @retval AS VARCHAR(MAX) = ''; -- return value
    DECLARE @i AS INT = 1;                -- title index
    DECLARE @c AS CHAR(1);                -- current character
    DECLARE @state AS INT = 0;            -- current state
    DECLARE @nextState AS INT;            -- next state
    DECLARE @tab AS CHAR(1) = CHAR(9);    -- tab
    DECLARE @lf AS CHAR(1) = CHAR(10);    -- line feed
    DECLARE @cr AS CHAR(1) = CHAR(13);    -- carriage return
    DECLARE @separators AS CHAR(8) = '[' + @tab + @lf + @cr + ' _-]';
    DECLARE @validchars AS CHAR(11) = '[a-zA-Z0-9]';

    WHILE (@i <= LEN(@title))
    BEGIN
        SELECT @c = SUBSTRING(@title, @i, 1),

        @nextState = CASE
            WHEN @c LIKE @validchars THEN 1
            WHEN @state = 0 THEN 0
            WHEN @state = 1 THEN CASE
                WHEN @c LIKE @separators THEN 2
                ELSE 3 -- unknown character
                END
            WHEN @state = 2 THEN 2
            WHEN @state = 3 THEN CASE
                WHEN @c LIKE @separators THEN 2
                ELSE 3 -- stay in state 3
                END
            END,

        @retval = @retval + CASE
            WHEN @nextState != 1 THEN ''
            WHEN @state = 0 THEN LOWER(@c)
            WHEN @state = 1 THEN LOWER(@c)
            WHEN @state = 2 THEN '-' + LOWER(@c)
            WHEN @state = 3 THEN LOWER(@c)
            END,

        @state = @nextState,

        @i = @i + 1
    END
    RETURN @retval;
END

【讨论】:

你能添加更多的 cmets 吗?【参考方案5】:

我将 Jeremy 的回答进一步推进了几步,即删除了所有连续的破折号,即使在替换了空格之后,也删除了前导和尾随破折号。

create function dbo.Slugify(@str nvarchar(max)) returns nvarchar(max) as
begin
    declare @IncorrectCharLoc int
    set @str = replace(replace(lower(@str),'.','-'),'''','')

    -- remove non alphanumerics:
    set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    while @IncorrectCharLoc > 0
    begin
        set @str = stuff(@str,@incorrectCharLoc,1,' ')
        set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    end

    -- replace all spaces with dashes
    set @str = replace(@str,' ','-')

    -- remove consecutive dashes:
    while charindex('--',@str) > 0
    begin
        set @str = replace(@str, '--', '-')
    end

    -- remove leading dashes
    while charindex('-', @str) = 1
    begin
        set @str = RIGHT(@str, len(@str) - 1)
    end

    -- remove trailing dashes
    while len(@str) > 0 AND substring(@str, len(@str), 1) = '-'
    begin
        set @str = LEFT(@str, len(@str) - 1)
    end
return @str
end

【讨论】:

【参考方案6】:

以下是 Jeremy 回应的变体。这在技术上可能不会令人沮丧,因为我正在做一些自定义的事情,比如替换“。”用“-dot-”,去掉撇号。主要的改进是这个也去掉了所有连续的空格,并且没有去掉预先存在的破折号。

create function dbo.Slugify(@str nvarchar(max)) returns nvarchar(max)
as
begin
    declare @IncorrectCharLoc int
    set @str = replace(replace(lower(@str),'.',' dot '),'''','')

    -- remove non alphanumerics:
    set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    while @IncorrectCharLoc > 0
    begin
        set @str = stuff(@str,@incorrectCharLoc,1,' ')
        set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
    end
    -- remove consecutive spaces:
    while charindex('  ',@str) > 0
    begin
    set @str = replace(@str, '  ', ' ')
    end
    set @str = replace(@str,' ','-')
return @str
end

【讨论】:

非常有帮助。我注意到这个脚本没有处理的两件事是尾随(可能是前导?)以右括号结尾的空格输入留下了尾随连字符。首先,修剪输入。对于第二个,我不确定,因为尾随连字符可能是故意的......【参考方案7】:

您可以使用LOWERREPLACE 来执行此操作:

SELECT REPLACE(LOWER(origString), ' ', '-')
FROM myTable

对于列的批量更新(代码根据origString列的值设置slug列:

UPDATE myTable
SET slug = REPLACE(LOWER(origString), ' ', '-')

【讨论】:

要正确处理 unicode 字符串,您需要的远不止这些。至少应处理所有非 ascii 字符。

以上是关于用于生成 slug 的 T-SQL 函数?的主要内容,如果未能解决你的问题,请参考以下文章

Javascript Slug 生成函数给出不正确的输出

htaccess 无法将文章 ID 重定向到使用 php slug 函数生成的文章 slug

DDD:在哪里生成实体的 url slug?

Django - 生成默认 slug

django 1.9 slug 字段不适用于外语

TYPO3 v10 routeEnhancer 生成 slug 但无法解决