如何使用 SQL/T-SQL 正确地对扩展字符集进行 Base64 编码
Posted
技术标签:
【中文标题】如何使用 SQL/T-SQL 正确地对扩展字符集进行 Base64 编码【英文标题】:How to Base64 encode extended character sets properly using SQL/T-SQL 【发布时间】:2019-12-07 06:38:36 【问题描述】:我无法以解码的方式获取以下值以进行 base64 编码。我有 base64 编码为以下值的系统(.NET 和 C++/Objective C)。
例如:
2LPZhNin2YUg2KzbjNix2KfZhg==
سلام جیران
В России Base64 кодирует вас
--> 0JIg0KDQvtGB0YHQuNC4IEJhc2U2NCDQutC+0LTQuNGA0YPQtdGCINCy0LDRgQ==
❤️????????????????⛄????????????????????????
--> 4p2k77iP8J+SpfCfpKrwn6aM8J+OheKbhPCfjoTwn6SQ8J+ZiPCfmYnwn5mK8J+SqQ==
我发现在堆栈溢出上发布的一个函数可以正确解码这些内容(这是为了修复一个破坏解码的字段(或类似的东西): Convert text value in SQL Server from UTF8 to ISO 8859-1,但我没有发现使用 SQL/T-SQL 对这些扩展字符(unicode?)值进行编码。我可以在 ios 设备上运行的 .NET 和 c++ 应用程序中执行此操作,但许多已发布答案的编码结果在解码时会导致垃圾。
我是否要求 SQL 解决它无法解决的问题?我的排序规则设置需要调整吗?
这是我试图从中收集解决方案并进行调整的其他帖子的列表,但遗憾的是,这并没有让我得到任何答案:
https://www.sqlservercentral.com/scripts/base64-encode-and-decode-in-t-sql-optimized
Convert text value in SQL Server from UTF8 to ISO 8859-1
https://www.jrevell.com/base64-encoding-and-decoding-with-sql-server-unicode-characters/
这是我用来测试编码和解码的一些示例 SQL(可怕的列名和所有)
注意:您需要上面网站中列出的功能才能使其按原样运行
注意 2:每当在这些函数中看到非编码值的 varchar 时,我都会将其转换为 nvarchar
DECLARE @hwDecoded AS NVARCHAR(100) = N'Hello Base64'
DECLARE @hwEncoded AS NVARCHAR(100) = 'SGVsbG8gQmFzZTY0'
DECLARE @persianDecoded AS NVARCHAR(100) = N'سلام جیران'
DECLARE @persianEncoded AS NVARCHAR(100) = '2LPZhNin2YUg2KzbjNix2KfZhg=='
DECLARE @russianDecoded AS NVARCHAR(100) = N'В России Base64 кодирует вас'
DECLARE @russianEncoded AS NVARCHAR(100) = '0JIg0KDQvtGB0YHQuNC4IEJhc2U2NCDQutC+0LTQuNGA0YPQtdGCINCy0LDRgQ=='
DECLARE @emojiDecoded AS NVARCHAR(200) = N'❤️????????????????⛄????????????????????????'
DECLARE @emojiEncoded AS NVARCHAR(200) = '4p2k77iP8J+SpfCfpKrwn6aM8J+OheKbhPCfjoTwn6SQ8J+ZiPCfmYnwn5mK8J+SqQ=='
DECLARE @Decoded AS NVARCHAR(100)
DECLARE @Encoded AS NVARCHAR(100)
SET @Decoded = @hwDecoded
SET @Encoded = @hwEncoded
SELECT @Decoded AS Decoded_ExpectedOutput
,@Encoded AS Encoded_ExpectedOutput
,dbo.DecodeUTF8String(CAST(N'' AS XML).value('xs:base64Binary(sql:variable("@Encoded"))','VARBINARY(MAX)')) AS Function_DecodeUTF8String_Output
,CAST( CAST((SELECT @Encoded FOR XML PATH('')) AS XML).value('.','varbinary(256)') AS varchar(256)) AS XML_PATH_VARCHAR_Decode_Output
,CAST( CAST((SELECT @Encoded FOR XML PATH('')) AS XML).value('.','varbinary(256)') AS nvarchar(256)) AS XML_PATH_NVARCHAR_Decode_Output
,dbo.fn_str_TO_BASE64(@Decoded) as fn_str_TO_BASE64_Output
,dbo.base64_encode(@Decoded) as base64_encode_Output
,(SELECT CAST(N'' AS XML).value('xs:base64Binary(xs:hexBinary(sql:column("bin")))', 'VARCHAR(MAX)') Base64Encoding FROM (SELECT CAST(@Decoded AS VARBINARY(MAX)) AS bin) AS bin_sql_server_temp) AS SingleLineVarChar
,(SELECT CAST(N'' AS XML).value('xs:base64Binary(xs:hexBinary(sql:column("bin")))', 'NVARCHAR(MAX)') Base64Encoding FROM (SELECT CAST(@Decoded AS VARBINARY(MAX)) AS bin) AS bin_sql_server_temp) AS SingleLineNVarChar
WHERE @Decoded = dbo.DecodeUTF8String(CAST(N'' AS XML).value('xs:base64Binary(sql:variable("@Encoded"))','VARBINARY(MAX)'))
SET @Decoded = @persianDecoded
SET @Encoded = @persianEncoded
SELECT @Decoded AS Decoded_ExpectedOutput
,@Encoded AS Encoded_ExpectedOutput
,dbo.DecodeUTF8String(CAST(N'' AS XML).value('xs:base64Binary(sql:variable("@Encoded"))','VARBINARY(MAX)')) AS Function_DecodeUTF8String_Output
,CAST( CAST((SELECT @Encoded FOR XML PATH('')) AS XML).value('.','varbinary(256)') AS varchar(256)) AS XML_PATH_VARCHAR_Decode_Output
,CAST( CAST((SELECT @Encoded FOR XML PATH('')) AS XML).value('.','varbinary(256)') AS nvarchar(256)) AS XML_PATH_NVARCHAR_Decode_Output
,dbo.fn_str_TO_BASE64(@Decoded) as fn_str_TO_BASE64_Output
,dbo.base64_encode(@Decoded) as base64_encode_Output
,(SELECT CAST(N'' AS XML).value('xs:base64Binary(xs:hexBinary(sql:column("bin")))', 'VARCHAR(MAX)') Base64Encoding FROM (SELECT CAST(@Decoded AS VARBINARY(MAX)) AS bin) AS bin_sql_server_temp) AS SingleLineVarChar
,(SELECT CAST(N'' AS XML).value('xs:base64Binary(xs:hexBinary(sql:column("bin")))', 'NVARCHAR(MAX)') Base64Encoding FROM (SELECT CAST(@Decoded AS VARBINARY(MAX)) AS bin) AS bin_sql_server_temp) AS SingleLineNVarChar
WHERE @Decoded = dbo.DecodeUTF8String(CAST(N'' AS XML).value('xs:base64Binary(sql:variable("@Encoded"))','VARBINARY(MAX)'))
SET @Decoded = @russianDecoded
SET @Encoded = @russianEncoded
SELECT @Decoded AS Decoded_ExpectedOutput
,@Encoded AS Encoded_ExpectedOutput
,dbo.DecodeUTF8String(CAST(N'' AS XML).value('xs:base64Binary(sql:variable("@Encoded"))','VARBINARY(MAX)')) AS Function_DecodeUTF8String_Output
,CAST( CAST((SELECT @Encoded FOR XML PATH('')) AS XML).value('.','varbinary(256)') AS varchar(256)) AS XML_PATH_VARCHAR_Decode_Output
,CAST( CAST((SELECT @Encoded FOR XML PATH('')) AS XML).value('.','varbinary(256)') AS nvarchar(256)) AS XML_PATH_NVARCHAR_Decode_Output
,dbo.fn_str_TO_BASE64(@Decoded) as fn_str_TO_BASE64_Output
,dbo.base64_encode(@Decoded) as base64_encode_Output
WHERE @Decoded = dbo.DecodeUTF8String(CAST(N'' AS XML).value('xs:base64Binary(sql:variable("@Encoded"))','VARBINARY(MAX)'))
SET @Decoded = @emojiDecoded
SET @Encoded = @emojiEncoded
SELECT @Decoded AS Decoded_ExpectedOutput
,@Encoded AS Encoded_ExpectedOutput
,dbo.DecodeUTF8String(CAST(N'' AS XML).value('xs:base64Binary(sql:variable("@Encoded"))','VARBINARY(MAX)')) AS Function_DecodeUTF8String_Output
,CAST( CAST((SELECT @Encoded FOR XML PATH('')) AS XML).value('.','varbinary(256)') AS varchar(256)) AS XML_PATH_VARCHAR_Decode_Output
,CAST( CAST((SELECT @Encoded FOR XML PATH('')) AS XML).value('.','varbinary(256)') AS nvarchar(256)) AS XML_PATH_NVARCHAR_Decode_Output
,dbo.fn_str_TO_BASE64(@Decoded) as fn_str_TO_BASE64_Output
,dbo.base64_encode(@Decoded) as base64_encode_Output
WHERE @Decoded = dbo.DecodeUTF8String(CAST(N'' AS XML).value('xs:base64Binary(sql:variable("@Encoded"))','VARBINARY(MAX)'))
我想我可以创建一个函数来测试每个测试值。处理这个扩展的字符集似乎真的令人窒息。任何朝着正确方向的推动都非常受欢迎。
【问题讨论】:
那么目标是什么?您的应用程序正在发送原始值,并且您想在插入之前应用 base64 编码? 这可能取决于使用的排序规则:docs.microsoft.com/en-us/sql/relational-databases/collations/… 此外。从 SQL Server 2019 开始。您可以使用 UTF-8 排序规则,这可能会消除此问题。 @gotqn 我有几个需要使用此编码/解码数据的地方。但是第一个需要未编码存储的列,为了让它在另一个平台上正常工作,我们需要/想要对数据进行编码,然后在应用程序上解码。我编写了这个 janky 控制台应用程序来完成服务器之外的工作,但它困扰着我,我无法在 SQL 中弄清楚(因此我的问题)。当我们更新和发布这两个应用程序时,我想在生产中转换数据,然后使用 SQL 来管理它。有几个需求融入其中。用户使用手机知识库创建的内容,因此推动了这一点。 【参考方案1】:您的应用程序将 unicode 字符编码为 utf8,而 sql server 将 unicode 字符编码为 ucs2/utf16。
Base64 接受二进制输入并将其编码为 ascii 字符串。 显然,utf8(可变字节大小)和utf16(固定2个字节)是不一样的,utf8和utf16的base64编码也会不同。 如果您希望 sql server 产生与您的应用程序相同的 base64 输出,那么您必须找到一种方法让 sql server 将 unicode 字符从 ucs2/utf16 转换为 utf8 编码。简而言之,您需要“utf8 bytes”
-
您可以使用编码为 utf8 的 CLR function。
使用 tsql 函数(重新发明***?)
可以在以下位置找到 tsql 函数的示例:https://gist.github.com/sevaa/f084a0a5a994c3bc28e518d5c708d5f6
create function [dbo].[ToUTF8](@s nvarchar(max))
returns varbinary(max)
as
begin
declare @i int = 1, @n int = datalength(@s)/2, @r varbinary(max) = 0x, @c int, @d varbinary(4)
while @i <= @n
begin
set @c = unicode(substring(@s, @i, 1))
if (@c & 0xFC00) = 0xD800
begin
set @i += 1
set @c = ((@c & 0x3FF) * 0x400) | 0x10000 | (unicode(substring(@s, @i, 1)) & 0x3FF)
end
if @c < 0x80
set @d = cast(@c as binary(1))
if @c >= 0x80 and @c < 0x800
set @d = cast(((@c * 4) & 0xFF00) | (@c & 0x3F) | 0xC080 as binary(2))
if @c >= 0x800 and @c < 0x10000
set @d = cast(((@c * 0x10) & 0xFF0000) | ((@c * 4) & 0x3F00) | (@c & 0x3F) | 0xe08080 as binary(3))
if @c >= 0x10000
set @d = cast(((@c * 0x40) & 0xFF000000) | ((@c * 0x10) & 0x3F0000) | ((@c * 4) & 0x3F00) | (@c & 0x3F) | 0xf0808080 as binary(4))
set @r += @d
set @i += 1
end
return @r
end
。 警告:我与创建 tsql 函数的开发人员没有关系,没有亲缘关系(我以前没有使用过它)。如果该功能对您有用,请给他功劳。
DECLARE @emojiDecoded AS NVARCHAR(200) = N'❤️????⛄??????';
SELECT CAST('' AS XML).value('xs:base64Binary(xs:hexBinary(sql:column("src.utf8bytes")))', 'VARCHAR(MAX)') emojibase64EncodedFromUtf8
FROM
(
SELECT dbo.ToUTF8(@emojiDecoded) AS utf8bytes
) AS src;
--4p2k77iP8J+SpfCfpKrwn6aM8J+OheKbhPCfjoTwn6SQ8J+ZiPCfmYnwn5mK8J+SqQ==
【讨论】:
以上是关于如何使用 SQL/T-SQL 正确地对扩展字符集进行 Base64 编码的主要内容,如果未能解决你的问题,请参考以下文章