提取前两个字符之间的字符

Posted 2023-03-25

技术标签:

【中文标题】提取前两个字符之间的字符【英文标题】：Extract character between the first two characters 【发布时间】：2020-12-27 12:21:19 【问题描述】：

我在 BigQuery 中有一个表：

ab_col_jfsfhfd_ggg_sdf
arfd_am_fdsf_fddg_fg
d_fdf_fdddg_ffddd_f

我想提取那些紧跟在第一个 _ 字符之后，然后是第二个 _ 字符的字符。我想得到以下内容：

col
am
fdf

我使用以下正则表达式来提取字符，但它没有按预期工作：

^.*\_(\D+)\_.*$
regexp_replace(id,'^.*\\_(\\D+)\\_.*$' , '\\1')

请帮忙！

【问题讨论】：

【参考方案1】：

如果我没听错，你可以使用split()：

(split(col, '_'))[safe_ordinal(2)]

split() 将字符串列转换为值数组，给定一个分隔符（这里，我们使用_）。然后我们就可以抓取第二个数组元素了。

【讨论】：

我收到这个错误：数组索引 2 超出范围（溢出） @Alex：那么看起来你的一些字符串不包含字符_。您可以使用safe_ordinal() 来避免该错误（并在这种情况下返回null）。【参考方案2】：

split() 是解决此问题的一种非常简单的方法。但是正则表达式也很简单：

with t as (
      select 'ab_col_jfsfhfd_ggg_sdf' as id union all
      select 'arfd_am_fdsf_fddg_fg' union all
      select 'd_fdf_fdddg_ffddd_f'
     )
select id, regexp_extract(id, '[^_]+', 1, 2)
from t;

该模式的逻辑是：“查找任何不是下划线的字符串。然后取字符串中的第二个。”

【讨论】：

【参考方案3】：

使用regexp_extract:

regexp_extract(id,'^[^_]+_([^_]+)')

见proof

说明

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  [^_]+                    any character except: '_' (1 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  _                        '_'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^_]+                    any character except: '_' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1

【讨论】：

以上是关于提取前两个字符之间的字符的主要内容，如果未能解决你的问题，请参考以下文章