REGEXP oracle几个not语句

Posted 2023-05-09

技术标签:

【中文标题】REGEXP oracle几个not语句【英文标题】：REGEXP Oracle several not statements 【发布时间】：2019-03-10 18:42:23 【问题描述】：

我有一个用逗号分隔的 4 个名称的字符串。 4 个名称中的 3 个内部有严格的标识符，最后一个没有标识符。字符串中名称的顺序是随机的。如何使用 oracle REGEXP 获取没有标识符的名称？

示例字符串：'a. Name1, b. Name2, Name3, c-f Name4'

严格标识符为'a.'、'b.'、'c-f'

name1、name2和name4我可以这样搞定：

select 
regexp_substr('a. Name1, b. Name2, Name3, c-f Name4','(^|, )a.[^,]+($|,)') as name1,
regexp_substr('a. Name1, b. Name2, Name3, c-f Name4','(^|, )b.[^,]+($|,)') as name2,
regexp_substr('a. Name1, b. Name2, Name3, c-f Name4','(^|, )c\-f[^,]+($|,)') as name4
from dual

我想通过以下方式获得 name3：

'(^|, )((NOT("a."))and(NOT("b."))and(NOT("c-f")))([^,]+($|,)'

但我不知道如何使用 REGEXP。甲骨文可以吗？

【问题讨论】：

如果没有前瞻（我相信在 oracle 中不直接支持），我认为 not/and 模式是不可能的。你需要找到一种更有创意的方法。对，没有前瞻就无法否定一系列字符。在一些奇异的正则表达式风格中还有其他构造，Oracle 正则表达式不是其中之一。如何定义“严格标识符”？对或错：您可以通过注意模式 comma-space-(nonspace, non-comma characters)-comma? 来识别没有严格标识符的名称？前导“逗号空格”可能是“字符串开头”，和/或尾随逗号可能是“字符串结尾”？好像是这样的。如果正确，它还会直接告诉你如何编写正则表达式。 @mathguy 我需要这样的东西：'(^|, )((NOT("a."))and(NOT("b."))and(NOT("c-f") ))([^,]+($|,)' @Room'on - 您没有回答我的问题（对或错，关于您如何识别没有严格标识符的名称）。编写一个仅对几个特定标识符进行硬编码的解决方案是没有意义的；如果您对a. 和b. 进行硬编码，对于t. name 等输入，查询将无法正常工作，对吗？ 【参考方案1】：

这将匹配第三个反向引用（任何用括号括起来的模式）。

REGEXP_REPLACE(
   yourStringColumn,
   'a\. (.*), b\. (.*), (.*), c-f (.*)',
   '\3'
)

我使用的模式中有 4 个反向引用，每个都是您要查找的名称。模式的其余部分（后面的引用之外）是您描述的模式的固定部分。只要记住要避开句号，以免它被视为通配符 ('\.')

编辑：

如果它们可以按任何顺序排列，我最好的尝试是在逗号（或字符串的开头/结尾）之间找到一个本身不包含逗号或空格的项目（空格表示有前缀）

SELECT
  regexp_replace(
    'c-f Name1, Name2, b. Name3, a. Name4', 
    '(^|.+, )([^, ]+)($|, .+)',
    '\2'
  )
FROM
  dual
;

【讨论】：

字符串中名字的顺序是随机的，例如'b.名称 2，名称 3，c-f 名称 4，a。名称1'。所以它并不总是字符串中的第三位。 @Room'on 希望简化的建议有所帮助？感谢您的建议，但在我的情况下 Name3 也可以包含空格...【参考方案2】：

必须是正则表达式吗？因为，如果没有，SUBSTR + INSTR 组合也可以完成这项工作。

SQL> with test (col) as
  2    (select 'a. Name1, b. Name2, Name3, c-f Name4' from dual)
  3  select
  4    trim(substr(col, instr(col, '.', 1, 1) + 1,
  5                     instr(col, ',', 1, 1) - instr(col, '.', 1, 1) - 1)) str1,
  6    trim(substr(col, instr(col, '.', 1, 2) + 1,
  7                     instr(col, ',', 1, 2) - instr(col, '.', 1, 2) - 1)) str2,
  8    trim(substr(col, instr(col, ',', 1, 2) + 1,
  9                     instr(col, ',', 1, 3) - instr(col, ',', 1, 2) - 1)) str3,
 10    trim(substr(col, instr(col, 'c-f', 1, 1) + 4)) str4
 11  from test;

STR1  STR2  STR3  STR4
----- ----- ----- -----
Name1 Name2 Name3 Name4

SQL>

[编辑，根据 MatBailie 的评论]

SQL> with test (col) as
  2    (select 'a. Name1, b. Name2, Name3, c-f Name4' from dual)
  3  select
  4    trim(substr(col, instr(col, 'a.', 1, 1) + 2,
  5                     instr(col, ', b.', 1, 1) - instr(col, 'a.', 1, 1) - 2)) str1,
  6    trim(substr(col, instr(col, 'b.', 1, 1) + 2,
  7                     instr(col, ',', 1, 2) - instr(col, 'b.', 1, 1) - 2)) str2,
  8    trim(substr(col, instr(col, ',', 1, 2) + 1,
  9                     instr(col, ',', 1, 3) - instr(col, ',', 1, 2) - 1)) str3,
 10    trim(substr(col, instr(col, 'c-f', 1, 1) + 4)) str4
 11  from test;

STR1  STR2  STR3  STR4
----- ----- ----- -----
Name1 Name2 Name3 Name4

SQL>

[编辑#2]

由于标识符可以放在任何地方，那么这样的代码怎么样？

SQL> with test (col) as
  2    (select 'a. Little foot, c-f Bruce Wayne, Catherine Zeta-Jones, b. Bill White Jr.' from dual),
  3  inter as
  4    (select trim(regexp_substr(col, '[^,]+', 1, level)) str
  5     from test
  6     connect by level <= regexp_count(col, ',') + 1
  7    ),
  8  inter2 as
  9    (select trim(replace(replace(replace(str, 'a.', ''),
 10                                              'b.', ''),
 11                                              'c-f', '')) result,
 12       rownum rn
 13     from inter
 14    )
 15  select max(decode(rn, 1, result)) n1,
 16         max(decode(rn, 2, result)) n2,
 17         max(decode(rn, 3, result)) n3,
 18         max(decode(rn, 4, result)) n4
 19  from inter2;

N1                   N2                   N3                   N4
-------------------- -------------------- -------------------- --------------------
Little foot          Bruce Wayne          Catherine Zeta-Jones Bill White Jr.

SQL>

【讨论】：

注意名称不包含句号等。根据我的经验，最好搜索完整的', b. '，而不是只搜索'.' 或',' 字符串中名字的顺序是随机的，例如可以是'b。名称 2，名称 3，c-f 名称 4，a。姓名1' 这些名称实际上包含什么？是否有点、逗号、破折号等，还是只有字母数字？我已经发布了另一个示例代码（编辑＃2）；请看一下。从 Op 的 cmets 中，这不是拆分到列表或找到第三项的问题。这是一种在不知道该项目将在列表中的哪个位置的情况下，查找开头没有 'a.' 或 'b.' 或 'c-f ' 的项目。我明白他们想要获取所有的值；他们对 a、b 和 c-f 没有问题，但对第四个值有问题...【参考方案3】：

我知道我可以多次使用 REGEXP_REPLACE 函数：

select 
regexp_replace(
regexp_replace(
regexp_replace(
'a. Name1, b. Name2, Name3, c-f Name4',
'(^|, | )a\.[^,]+($|,)'),
'(^|, | )b\.[^,]+($|,)'),
'(^|, | )c\-f[^,]+($|,)') as name3
from dual

感谢 MatBailie 提出使用 REGEXP_REPLACE 的想法！

【讨论】：

以上是关于REGEXP oracle几个not语句的主要内容，如果未能解决你的问题，请参考以下文章