用于生成 slug 的 T-SQL 函数?
Posted
技术标签:
【中文标题】用于生成 slug 的 T-SQL 函数?【英文标题】:T-SQL function for generating slugs? 【发布时间】:2011-03-06 04:26:47 【问题描述】:快速检查是否有人拥有或知道能够从给定的 nvarchar 输入生成 slug 的 T-SQL 函数。即;
“你好世界”>“你好世界” “这是一个测试”>“这是一个测试”
我有一个通常用于这些目的的 C# 函数,但在这种情况下,我有大量数据要解析并转换为 slug,因此在 SQL Server 上执行此操作更有意义,而不是必须通过网络传输数据。
顺便说一句,我没有远程桌面访问权限,因此我无法针对它运行代码(.net、Powershell 等)
提前致谢。
编辑: 根据要求,这是我通常用来生成 slug 的函数:
public static string GenerateSlug(string n, int maxLength)
string s = n.ToLower();
s = Regex.Replace(s, @"[^a-z0-9s-]", "");
s = Regex.Replace(s, @"[s-]+", " ").Trim();
s = s.Substring(0, s.Length <= maxLength ? s.Length : maxLength).Trim();
s = Regex.Replace(s, @"s", "-");
return s;
【问题讨论】:
你能发布你的 C# slug-generator 函数吗? 【参考方案1】:To slug with Vietnamese unicode
CREATE function [dbo].[toslug](@string nvarchar(4000))
RETURNS varchar(4000) AS BEGIN
declare @out nvarchar(4000)
declare @from nvarchar(255)
declare @to varchar(255)
--convert to ASCII dbo.slugify
set @string = lower(@string)
set @out = @string
set @from = N'ýỳỷỹỵáàảãạâấầẩẫậăắằẳẵặéèẻẽẹêếềểễệúùủũụưứừửữựíìỉĩịóòỏõọơớờởỡợôốồổỗộđ·/_,:;'
set @to = 'yyyyyaaaaaaaaaaaaaaaaaeeeeeeeeeeeuuuuuuuuuuuiiiiioooooooooooooooood------'
declare @pi int
set @pi = 1
--I'm sorry T-SQL have no regex. Thanks for patindex, MS .. :-)
while @pi<=len(@from) begin
set @out = replace(@out, substring(@from,@pi,1), substring(@to,@pi,1))
set @pi = @pi + 1
end
set @out = ltrim(rtrim(@out))
-- replace space to hyphen
set @out = replace(@out, ' ', '-')
-- remove double hyphen
while CHARINDEX('--', @out) > 0 set @out = replace(@out, '--', '-')
return (@out)
END
【讨论】:
请看看你的帖子是如何呈现的,edit it to fix the formatting。您可以在后期编辑区域获得一个方便的小预览,以便在提交之前查看它的外观。 这个链接可能对你有用 - How to Answer【参考方案2】:我知道这是一个旧线程,但是对于下一代,我发现了一个甚至可以处理重音符号的函数here:
CREATE function [dbo].[slugify](@string varchar(4000))
RETURNS varchar(4000) AS BEGIN
declare @out varchar(4000)
--convert to ASCII
set @out = lower(@string COLLATE SQL_Latin1_General_CP1251_CS_AS)
declare @pi int
--I'm sorry T-SQL have no regex. Thanks for patindex, MS .. :-)
set @pi = patindex('%[^a-z0-9 -]%',@out)
while @pi>0 begin
set @out = replace(@out, substring(@out,@pi,1), '')
--set @out = left(@out,@pi-1) + substring(@out,@pi+1,8000)
set @pi = patindex('%[^a-z0-9 -]%',@out)
end
set @out = ltrim(rtrim(@out))
-- replace space to hyphen
set @out = replace(@out, ' ', '-')
-- remove double hyphen
while CHARINDEX('--', @out) > 0 set @out = replace(@out, '--', '-')
return (@out)
END
【讨论】:
使用COLLATE SQL_Latin1_General_CP1251_CS_AS
去除重音仅适用于 varchar 变量。如果您尝试对 nvarchar 变量执行相同操作,则不会发生任何事情。如果输入是 nvarchar,则必须在某些时候使用 cast(@string as varchar)
将其显式转换为 varchar。如果你不这样做,口音将保持在原来的位置。
可能还需要使用具有特定长度的强制转换,因为如果在SELECT
语句中将其转换为varchar
,SQL Server 似乎会缩短您的字符串。例如cast(@string as varchar(500))
。【参考方案3】:
这是我想出的解决方案。随意修复/修改需要的地方。
我应该提到我目前正在开发的数据库不区分大小写,因此是 LOWER(@str)。
CREATE FUNCTION [dbo].[UDF_GenerateSlug]
(
@str VARCHAR(100)
)
RETURNS VARCHAR(100)
AS
BEGIN
DECLARE @IncorrectCharLoc SMALLINT
SET @str = LOWER(@str)
SET @IncorrectCharLoc = PATINDEX('%[^0-9a-z ]%',@str)
WHILE @IncorrectCharLoc > 0
BEGIN
SET @str = STUFF(@str,@incorrectCharLoc,1,'')
SET @IncorrectCharLoc = PATINDEX('%[^0-9a-z ]%',@str)
END
SET @str = REPLACE(@str,' ','-')
RETURN @str
END
提及:http://blog.sqlauthority.com/2007/05/13/sql-server-udf-function-to-parse-alphanumeric-characters-from-string/ 为原始代码。
【讨论】:
不应该是'%[^0-9a-z] %'
,而应该是'%[^0-9a-z- ]%'
【参考方案4】:
-- Converts a title such as "This is a Test" to an all lower case string such
-- as "this-is-a-test" for use as the slug in a URL. All runs of separators
-- (whitespace, underscore, or hyphen) are converted to a single hyphen.
-- This is implemented as a state machine having the following four states:
--
-- 0 - initial state
-- 1 - in a sequence consisting of valid characters (a-z, A-Z, or 0-9)
-- 2 - in a sequence of separators (whitespace, underscore, or hyphen)
-- 3 - encountered a character that is neither valid nor a separator
--
-- Once the next state has been determined, the return value string is
-- built based on the transitions from the current state to the next state.
--
-- State 0 skips any initial whitespace. State 1 includes all valid slug
-- characters. State 2 converts multiple separators into a single hyphen
-- and skips trailing whitespace. State 3 skips any punctuation between
-- between characters and, if no additional whitespace is encountered,
-- then the punctuation is not treated as a word separator.
--
CREATE FUNCTION ToSlug(@title AS NVARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @retval AS VARCHAR(MAX) = ''; -- return value
DECLARE @i AS INT = 1; -- title index
DECLARE @c AS CHAR(1); -- current character
DECLARE @state AS INT = 0; -- current state
DECLARE @nextState AS INT; -- next state
DECLARE @tab AS CHAR(1) = CHAR(9); -- tab
DECLARE @lf AS CHAR(1) = CHAR(10); -- line feed
DECLARE @cr AS CHAR(1) = CHAR(13); -- carriage return
DECLARE @separators AS CHAR(8) = '[' + @tab + @lf + @cr + ' _-]';
DECLARE @validchars AS CHAR(11) = '[a-zA-Z0-9]';
WHILE (@i <= LEN(@title))
BEGIN
SELECT @c = SUBSTRING(@title, @i, 1),
@nextState = CASE
WHEN @c LIKE @validchars THEN 1
WHEN @state = 0 THEN 0
WHEN @state = 1 THEN CASE
WHEN @c LIKE @separators THEN 2
ELSE 3 -- unknown character
END
WHEN @state = 2 THEN 2
WHEN @state = 3 THEN CASE
WHEN @c LIKE @separators THEN 2
ELSE 3 -- stay in state 3
END
END,
@retval = @retval + CASE
WHEN @nextState != 1 THEN ''
WHEN @state = 0 THEN LOWER(@c)
WHEN @state = 1 THEN LOWER(@c)
WHEN @state = 2 THEN '-' + LOWER(@c)
WHEN @state = 3 THEN LOWER(@c)
END,
@state = @nextState,
@i = @i + 1
END
RETURN @retval;
END
【讨论】:
你能添加更多的 cmets 吗?【参考方案5】:我将 Jeremy 的回答进一步推进了几步,即删除了所有连续的破折号,即使在替换了空格之后,也删除了前导和尾随破折号。
create function dbo.Slugify(@str nvarchar(max)) returns nvarchar(max) as
begin
declare @IncorrectCharLoc int
set @str = replace(replace(lower(@str),'.','-'),'''','')
-- remove non alphanumerics:
set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
while @IncorrectCharLoc > 0
begin
set @str = stuff(@str,@incorrectCharLoc,1,' ')
set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
end
-- replace all spaces with dashes
set @str = replace(@str,' ','-')
-- remove consecutive dashes:
while charindex('--',@str) > 0
begin
set @str = replace(@str, '--', '-')
end
-- remove leading dashes
while charindex('-', @str) = 1
begin
set @str = RIGHT(@str, len(@str) - 1)
end
-- remove trailing dashes
while len(@str) > 0 AND substring(@str, len(@str), 1) = '-'
begin
set @str = LEFT(@str, len(@str) - 1)
end
return @str
end
【讨论】:
【参考方案6】:以下是 Jeremy 回应的变体。这在技术上可能不会令人沮丧,因为我正在做一些自定义的事情,比如替换“。”用“-dot-”,去掉撇号。主要的改进是这个也去掉了所有连续的空格,并且没有去掉预先存在的破折号。
create function dbo.Slugify(@str nvarchar(max)) returns nvarchar(max)
as
begin
declare @IncorrectCharLoc int
set @str = replace(replace(lower(@str),'.',' dot '),'''','')
-- remove non alphanumerics:
set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
while @IncorrectCharLoc > 0
begin
set @str = stuff(@str,@incorrectCharLoc,1,' ')
set @IncorrectCharLoc = patindex('%[^0-9a-z -]%',@str)
end
-- remove consecutive spaces:
while charindex(' ',@str) > 0
begin
set @str = replace(@str, ' ', ' ')
end
set @str = replace(@str,' ','-')
return @str
end
【讨论】:
非常有帮助。我注意到这个脚本没有处理的两件事是尾随(可能是前导?)以右括号结尾的空格输入留下了尾随连字符。首先,修剪输入。对于第二个,我不确定,因为尾随连字符可能是故意的......【参考方案7】:您可以使用LOWER
和REPLACE
来执行此操作:
SELECT REPLACE(LOWER(origString), ' ', '-')
FROM myTable
对于列的批量更新(代码根据origString
列的值设置slug
列:
UPDATE myTable
SET slug = REPLACE(LOWER(origString), ' ', '-')
【讨论】:
要正确处理 unicode 字符串,您需要的远不止这些。至少应处理所有非 ascii 字符。以上是关于用于生成 slug 的 T-SQL 函数?的主要内容,如果未能解决你的问题,请参考以下文章