如何仅使用 Oracle SQL 返回 2 个字符串之间的差异

Posted 2023-03-28

技术标签:

【中文标题】如何仅使用 Oracle SQL 返回 2 个字符串之间的差异【英文标题】：How to return the difference between 2 strings using Oracle SQL only 【发布时间】：2020-04-10 12:34:04 【问题描述】：

例如，我有 2 个字符串：

'来源：Siebel；姓名：玛丽·简；性别：F；年龄：24； N;' '来源：Siebel；姓名：玛丽；性别：F；年龄：24； N;'

我需要的结果是：

姓名：玛丽·简；姓名：玛丽；

很可能我需要反转下面的代码

with cte1 as  (
    select 1 id, 'Source:Siebel; Name:Mary Jane; Gender:F; Age:24; N;' str from dual
    union all
    select 2 id, 'Source:Siebel; Name:Marie; Gender:F; Age:24; N;' str from dual
), cte2 as (
    SELECT distinct id, trim(regexp_substr(str, '[^ ]+', 1, level)) str
    FROM cte1 t
    CONNECT BY instr(str, ' ', 1, level - 1) > 0
)
select distinct t1.str
from cte2 t1
join cte2 t2 on (t1.str = t2.str and t1.id != t2.id)

来自 Oracle Function to return similarity between strings

因为结果是相似的 2 个字符串的 [QueryResult]

我无法使用该过程，因为我需要在 Oracle Fusion 中运行此 SQL 脚本

【问题讨论】：

为什么以这种格式的数据开头？即使它来自一些只输出复杂字符串的来源，当您将数据导入数据库时，也应该首先对其进行规范化。您好，我使用了该示例数据，因为我正在使用的实际数据是来自 Oracle 接口表和基表的连接值。并且我从两者的结果中进行比较，以确保将 Interface 表中的数据传递到 Base 表中，我不确定我是否理解。数据要么从“接口”传递到“基表”，要么没有；我看不出在这个过程中“玛丽珍”是如何变成“玛丽”的。将数据从一个地方复制或传输到另一个地方可能会在很多方面失败，但更改数据内容的情况非常罕见（我认为）。 【参考方案1】：

这会有帮助吗？

SQL> with cte1 as  (
  2   select 1 id, 'Source:Siebel; Name:Mary Jane; Gender:F; Age:24; N;' str from dual
  3   union all
  4   select 2 id, 'Source:Siebel; Name:Marie; Gender:F; Age:24; N;' str from dual
  5   ),
  6  cte2 as
  7    (select id,
  8       column_value lvl,
  9       trim(regexp_substr(str, '[^;]+', 1, column_value)) str
 10     from cte1 cross join
 11       table(cast(multiset(select level from dual
 12                           connect by level <= regexp_count(str, ';') +1
 13                          ) as sys.odcinumberlist))
 14    )
 15  select a.str, b.str
 16  From cte2 a join cte2 b on a.id < b.id and a.lvl = b.lvl and a.str <> b.str;

STR             STR
--------------- ---------------
Name:Mary Jane  Name:Marie

SQL>

【讨论】：

嘿！我认为这会奏效。我只需要将它与我的整个脚本合并如果 OP 示例中的行或字符串超过两行，这将不起作用。关键字是“IF”。据我们所知，没有。【参考方案2】：

我需要的结果是：

Name:Mary Jane; 
Name:Marie;

你可以使用LAG/LEAD 分析函数来得到你想要的输出。

具有多个输入值的演示，例如“玛丽·简”、“玛丽”、“简”、“琼斯”

with t1 as  (
    select 1 id, 'Source:Siebel; Name:Mary Jane; Gender:F; Age:24; N;' str from dual
    union all
    select 2 id, 'Source:Siebel; Name:Marie; Gender:F; Age:24; N;' str from dual
    union all
    select 3 id, 'Source:Siebel; Name:Jane; Gender:F; Age:24; N;' str from dual
    union all
    select 4 id, 'Source:Siebel; Name:Jones; Gender:F; Age:24; N;' str from dual
), t2 as (
SELECT t1.id,
        trim(regexp_substr(t1.str, '[^;]+', 1, lines.column_value)) str
    FROM t1,
      TABLE (CAST (MULTISET
      (SELECT LEVEL FROM dual
              CONNECT BY instr(t1.str, ';', 1, LEVEL) > 0
      ) AS sys.odciNumberList ) ) lines
    ORDER BY id, lines.column_value)
select id, str from(
  select id, 
         str, 
        lag(str) over(partition by str order by str) lag, 
        lead(str) over(partition by str order by str) lead from t2
) where lag is null
  and   lead is null
order by id;

        ID STR
---------- -----------------------
         1 Name:Mary Jane
         2 Name:Marie    
         3 Name:Jane     
         4 Name:Jones

这将为您提供字符串中任何与其他字符串不匹配的属性、姓名、年龄、性别等之间的差异。

【讨论】：

以上是关于如何仅使用 Oracle SQL 返回 2 个字符串之间的差异的主要内容，如果未能解决你的问题，请参考以下文章