规范化 SQL Server 中的 unicode 字符串?

Posted

技术标签:

【中文标题】规范化 SQL Server 中的 unicode 字符串?【英文标题】:Normalize unicode string in SQL Server? 【发布时间】:2011-12-10 08:24:45 【问题描述】:

SQL Server 中是否有规范化 unicode 字符串的函数?例如

UPDATE Orders SET Notes = NormalizeString(Notes, 'FormC')

Unicode 规范化形式:

C​作曲(C):A + ¨ 变为 Ä D​ecomposition(D):Ä 变为 A + ¨ 兼容组合(KC):A + ¨ + + n 变为 Ä + f + i + n 兼容分解(KD):Ä + + n 变为A + ¨ + f + i + n

我找不到任何内置函数,所以我假设没有。


理想情况下,如果只能有一个,那么我今天恰好需要Form C:

Unicode 规范化形式 C,规范组合。将每个分解的分组(由一个基本字符加上组合字符组成)转换为规范的预组合等价物。例如,A + ¨ 变成 Ä。

另见

Unicode Normalization in Windows How do I remove diacritics (accents) from a string in .NET? NormalizeString function Sorting it all out: What normalization form does SQL Server use

【问题讨论】:

【参考方案1】:

抱歉,没有,迄今为止的任何 SQL Server 版本(2012 测试版本)中都没有这样的功能。比较可以正确地对组合不敏感,但没有将字符组合用法转换为一种正常形式的功能。

已建议在语法NORMALIZE(string, NFC) 下将其用于ANSI 标准的未来版本,但要使其成为现实世界还需要很长时间。现在,如果您想进行规范化,您必须使用具有更好字符串处理能力的适当编程语言来完成,方法是从数据库中提取字符串或编写 CLR 存储过程来完成。

【讨论】:

【参考方案2】:

试试这个 CLR 函数

using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

public partial class UserDefinedFunctions

    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlString fnRemoveDiacritics(string text)
    
        string stringFormD = text.Normalize(System.Text.NormalizationForm.FormD);
        System.Text.StringBuilder retVal = new System.Text.StringBuilder();
        for (int index = 0; index < stringFormD.Length; index++)
        
            if (System.Globalization.CharUnicodeInfo.GetUnicodeCategory(stringFormD[index]) != System.Globalization.UnicodeCategory.NonSpacingMark)
                retVal.Append(stringFormD[index]);
        
        return retVal.ToString().Normalize(System.Text.NormalizationForm.FormC);
    

在 SQL 中

SELECT dbo.fnRemoveDiacritics('Äěščřžýáíé')
-- Returns: Aescrzyaie

感谢 http://www.dotnetportal.cz/blogy/4/Tomas-Jecha/663/NET-Tip-6-Ciste-odstraneni-diakritiky

【讨论】:

如果你从sql中硬编码字符串,你应该指定N'Äěščřžýáíé'(对于nvarchar @IanBoyd 我不明白你说的那个图片是什么意思。我不是说出于性能原因,我的意思是字符串会丢失信息:https://i.imgur.com/6QbzEgV.png @DavidS。啊,i see what you're driving at【参考方案3】:

我有点问题,

我用 C# 编写了一个新的 CRL 函数,并像 SQL 函数一样使用。

我的 C# 代码(规范化为 NFC 或 NFD)。另外,正确处理NULL字符串。:

using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Text;

    public static class CLRUnicodeNormalize
    
        [SqlFunction]
        public static SqlString UnicodeNormalizeC(SqlString inStr)
        
            if (inStr.IsNull) return SqlString.Null;
            return inStr.ToString().Normalize(NormalizationForm.FormC);
        
        public static SqlString UnicodeNormalizeD(SqlString inStr)
        
            if (inStr.IsNull) return SqlString.Null;
            return inStr.ToString().Normalize(NormalizationForm.FormD);
        
    

在 SQL 查询中使用示例(形成任何对 C 的规范化):

UPDATE o SET o.ObjectName=dbo.UnicodeNormalizeC(o.ObjectName)

使用前安装规范化功能(在这种情况下您不需要 C#...二进制文件包含可使用的 .Net 代码,请参见上面的源代码):

/* 
GO
DROP FUNCTION [dbo].UnicodeNormalize
GO
DROP ASSEMBLY [CLRUnicodeNormalize]
GO
*/


CREATE ASSEMBLY [CLRUnicodeNormalize]
FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C010300747F4BC90000000000000000E00022200B013000000A0000000600000000000076280000002000000040000000000010002000000002000004000000000000000600000000000000008000000002000000000000030060850000100000100000000010000010000000000000100000000000000000000000242800004F00000000400000C803000000000000000000000000000000000000006000000C00000070270000380000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E746578740000007C08000000200000000A000000020000000000000000000000000000200000602E72737263000000C80300000040000000040000000C0000000000000000000000000000400000402E72656C6F6300000C000000006000000002000000100000000000000000000000000000400000420000000000000000000000000000000058280000000000004800000002000500A4200000CC0600000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000A20F00281000000A2C067E1100000A2A0F00FE16120000016F1200000A176F1300000A281400000A2AA20F00281000000A2C067E1100000A2A0F00FE16120000016F1200000A186F1300000A281400000A2A000042534A4201000100000000000C00000076342E302E33303331390000000005006C00000014020000237E0000800200000803000023537472696E6773000000008805000004000000235553008C0500001000000023475549440000009C0500003001000023426C6F620000000000000002000001471500000900000000FA0133001600000100000014000000020000000200000002000000140000000F00000001000000020000000000FE0101000000000006003201A50206009F01A5020600510073020F00C5020000060079003A02060015013A020600E1003A02060086013A02060052013A0206006B013A02060090003A020600650086020600430086020600C4003A020600AB00D1010600E90221020A0000014C020A00EB01D4020600F701210206002802FC02000000000100000000000100010081011000BD01000041000100010050200000000096000A003B00010079200000000096001C003B000200000001006D02000001006D02090067020100110067020600190067020A00290067021000310067021000390067021000410067021000490067021000510067021000590067021000610067021500690067021000710067021000790067021000890067020600910016021A0091001C021E008100F50122009900C70126009100F0022C0020007B002A012E000B0042002E0013004B002E001B006A002E00230073002E002B008B002E0033008B002E003B008B002E00430073002E004B0091002E0053008B002E005B008B002E006300A9002E006B00D3002E007300E000048000000100000000000000000000000000BD01000004000000000000000000000032003A000000000004000000000000000000000032002E00000000000000003C4D6F64756C653E00556E69636F64654E6F726D616C697A654300556E69636F64654E6F726D616C697A65440053797374656D2E44617461006D73636F726C696200477569644174747269627574650044656275676761626C6541747472696275746500436F6D56697369626C6541747472696275746500417373656D626C795469746C6541747472696275746500417373656D626C7954726164656D61726B417474726962757465005461726765744672616D65776F726B41747472696275746500417373656D626C7946696C6556657273696F6E41747472696275746500417373656D626C79436F6E66696775726174696F6E4174747269627574650053716C46756E6374696F6E41747472696275746500417373656D626C794465736372697074696F6E41747472696275746500436F6D70696C6174696F6E52656C61786174696F6E7341747472696275746500417373656D626C7950726F6475637441747472696275746500417373656D626C79436F7079726967687441747472696275746500417373656D626C79436F6D70616E794174747269627574650052756E74696D65436F6D7061746962696C69747941747472696275746500434C52556E69636F64654E6F726D616C697A650053797374656D2E52756E74696D652E56657273696F6E696E670053716C537472696E6700546F537472696E6700434C52556E69636F64654E6F726D616C697A652E646C6C006765745F49734E756C6C0053797374656D004E6F726D616C697A6174696F6E466F726D0053797374656D2E5265666C656374696F6E004D6963726F736F66742E53716C5365727665722E536572766572002E63746F7200696E5374720053797374656D2E446961676E6F73746963730053797374656D2E52756E74696D652E496E7465726F7053657276696365730053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300446562756767696E674D6F6465730053797374656D2E446174612E53716C5479706573004F626A656374006F705F496D706C696369740053797374656D2E54657874000000000042B7983A5EC3D848A69F95A94C809E3900042001010803200001052001011111042001010E042001010203200002030611490320000E0520010E115105000111490E08B77A5C561934E089060001114911490801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F7773010801000200000000001701001253514C4E6F726D616C697A65537472696E67000005010000000017010012436F7079726967687420C2A920203230313900002901002434303635643437642D343839642D343638332D626233652D32393963316632623536626400000C010007312E302E302E3000004901001A2E4E45544672616D65776F726B2C56657273696F6E3D76342E360100540E144672616D65776F726B446973706C61794E616D65122E4E4554204672616D65776F726B20342E3604010000000000000000A704E49600000000020000007C000000A8270000A809000000000000000000000000000010000000000000000000000000000000525344534526D8C335B5074C819DB6DF9CC26A3701000000433A5C55736572735C537A395C736F757263655C7265706F735C53514C4E6F726D616C697A65537472696E675C53514C4E6F726D616C697A65537472696E675C6F626A5C52656C656173655C434C52556E69636F64654E6F726D616C697A652E706462004C28000000000000000000006628000000200000000000000000000000000000000000000000000058280000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF250020001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000018000080000000000000000000000000000001000100000030000080000000000000000000000000000001000000000048000000584000006C03000000000000000000006C0334000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000001000000000000000100000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B004CC020000010053007400720069006E006700460069006C00650049006E0066006F000000A802000001003000300030003000300034006200300000001A000100010043006F006D006D0065006E007400730000000000000022000100010043006F006D00700061006E0079004E0061006D00650000000000000000004E0013000100460069006C0065004400650073006300720069007000740069006F006E0000000000530051004C004E006F0072006D0061006C0069007A00650053007400720069006E00670000000000300008000100460069006C006500560065007200730069006F006E000000000031002E0030002E0030002E003000000050001800010049006E007400650072006E0061006C004E0061006D006500000043004C00520055006E00690063006F00640065004E006F0072006D0061006C0069007A0065002E0064006C006C0000004800120001004C006500670061006C0043006F007000790072006900670068007400000043006F0070007900720069006700680074002000A90020002000320030003100390000002A00010001004C006500670061006C00540072006100640065006D00610072006B00730000000000000000005800180001004F0072006900670069006E0061006C00460069006C0065006E0061006D006500000043004C00520055006E00690063006F00640065004E006F0072006D0061006C0069007A0065002E0064006C006C000000460013000100500072006F0064007500630074004E0061006D00650000000000530051004C004E006F0072006D0061006C0069007A00650053007400720069006E00670000000000340008000100500072006F006400750063007400560065007200730069006F006E00000031002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000031002E0030002E0030002E0030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000783800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
WITH PERMISSION_SET = UNSAFE
GO

CREATE FUNCTION [dbo].UnicodeNormalizeC(@inStr nvarchar(max)) RETURNS nvarchar(max) 
AS EXTERNAL NAME [CLRUnicodeNormalize].[CLRUnicodeNormalize].[UnicodeNormalizeC]
GO
CREATE FUNCTION [dbo].UnicodeNormalizeD(@inStr nvarchar(max)) RETURNS nvarchar(max) 
AS EXTERNAL NAME [CLRUnicodeNormalize].[CLRUnicodeNormalize].[UnicodeNormalizeD]
GO

使用 .Net 版本编译:4.6

我对上面的二进制代码的使用不承担任何责任!

测试示例:

DECLARE @str_nfd nvarchar(10) = dbo.UnicodeNormalizeD('á');
SELECT LEN(@str_nfd);   -- NFD normalization, return: 2
SELECT LEN(dbo.UnicodeNormalizeC(@str_nfd)); -- back to NFC, return: 1

【讨论】:

【参考方案4】:

在 SQL Server 中没有内置解决方案的情况下,如果您不想编写 C# 函数,您可以使用 translate 函数手动执行此操作:

select translate(last_name, 'éêëèàäçïîìôöòûù', 'eeeeaaciiiooouu') from employees

您也可以使用以下解决方案:https://coderwall.com/p/a6koxq/how-to-remove-diacritics-in-sql-server

在 SQL Server 中从字符串中删除变音符号的最简单方法是 使用不包含的字符集整理字符串 变音符号,例如:

SELECT 'àéêöhello!' Collate SQL_Latin1_General_CP1253_CI_AI

这将输出:

aeeohello!

仅当您不使用 unicode 字符串时才有效,因此将其转换为 如果你有一个 unicode 字符串,首先是 varchar。

【讨论】:

以上是关于规范化 SQL Server 中的 unicode 字符串?的主要内容,如果未能解决你的问题,请参考以下文章

表迁移到 SQL Server 后,Access 无法筛选 Unicode 字符

sql server 存入中文前加N

在 sql server 2008 中使用 unicode 文本

Windows 中的 Unicode 规范化

在Oracle 中有类似 sqlserver 中的unicode 的函数吗

将 Unicode 从 R 写入 SQL Server