bigquery 中 regexp_replace 结果的条件规则

Posted 2023-03-25

技术标签:

【中文标题】bigquery 中 regexp_replace 结果的条件规则【英文标题】：Conditional rules for results of a regexp_replace in bigquery 【发布时间】：2020-11-20 16:47:46 【问题描述】：

我正在编写一个 bigquery 标准 sql 语句，我想保留字母数字和连字符，同时删除空格和所有其他字符，这很容易：

REGEXP_REPLACE(field_1, '[^0-9a-zA-Z-]','')

但是，如果由于这个正则表达式，两个以前没有接触过的数字接触，我希望它们用“@”分隔。

示例： "他的体重是 2...... 1" --> hisweightis2@1 "他的体重现在是 2 1" --> hisweightis2@1now "他的体重或多或少是 2-1" --> hisweightis2-1moreorless "他的体重现在是 21 和 3" --> hisweightis21and3now

【问题讨论】：

【参考方案1】：

试试下面（BigQuery 标准 SQL）

#standardSQL
select field_1, 
  regexp_replace(regexp_replace(regexp_replace(
    field_1, 
    r'(\d)[^0-9a-zA-Z-]+(\d)', r'\1@\2'), 
      r'[^0-9a-zA-Z-@]', ''),
        r'([\D])@([\D])', r'\1\2'
    ) result
from `project.dataset.table`

如果应用于您问题的样本数据 - 输出是

【讨论】：

以上是关于bigquery 中 regexp_replace 结果的条件规则的主要内容，如果未能解决你的问题，请参考以下文章