正则表达式逗号分隔分隔符

Posted

技术标签:

【中文标题】正则表达式逗号分隔分隔符【英文标题】:regex comma separated delimiter 【发布时间】:2017-12-05 03:43:22 【问题描述】:

我正在尝试用逗号分隔符拆分我的列。因此该列具有多个值,例如; 139,239,338,323。出于某种原因,以下代码适用于第一列,但其余列为空。

SELECT  
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)0([^,\/]*),\/?') as Word0,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)1([^,\/]*),\/?') as Word1,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)2([^,\/]*),\/?') as Word2,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)3([^,\/]*),\/?') as Word3,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)4([^,\/]*),\/?') as Word4,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)5([^,\/]*),\/?') as Word5,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)6([^,\/]*),\/?') as Word6,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)7([^,\/]*),\/?') as Word7,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)8([^,\/]*),\/?') as Word8,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)9([^,\/]*),\/?') as Word9
FROM
(SELECT event_list AS StringToParse FROM `mytable.2017`)

【问题讨论】:

好吧,所有奇怪的转义正斜杠是怎么回事。使模式不可读。如何匹配逗号而不是非逗号? (.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?) 【参考方案1】:

试试下面的 BigQuery 标准 SQL

#standardSQL
SELECT 
  SPLIT(StringToParse)[SAFE_OFFSET (0)] AS Word0, 
  SPLIT(StringToParse)[SAFE_OFFSET (1)] AS Word1, 
  SPLIT(StringToParse)[SAFE_OFFSET (2)] AS Word2, 
  SPLIT(StringToParse)[SAFE_OFFSET (3)] AS Word3, 
  SPLIT(StringToParse)[SAFE_OFFSET (4)] AS Word4, 
  SPLIT(StringToParse)[SAFE_OFFSET (5)] AS Word5, 
  SPLIT(StringToParse)[SAFE_OFFSET (6)] AS Word6, 
  SPLIT(StringToParse)[SAFE_OFFSET (7)] AS Word7, 
  SPLIT(StringToParse)[SAFE_OFFSET (8)] AS Word8, 
  SPLIT(StringToParse)[SAFE_OFFSET (9)] AS Word9 
FROM 
  (SELECT event_list AS StringToParse FROM `mytable.2017`) 

您可以使用下面的虚拟数据测试/玩上面的内容

#standardSQL
WITH `mytable.2017` AS (
  SELECT '139,239,338,323' AS event_list UNION ALL
  SELECT '123,456,789,135'
)
SELECT 
  SPLIT(StringToParse)[SAFE_OFFSET (0)] AS Word0, 
  SPLIT(StringToParse)[SAFE_OFFSET (1)] AS Word1, 
  SPLIT(StringToParse)[SAFE_OFFSET (2)] AS Word2, 
  SPLIT(StringToParse)[SAFE_OFFSET (3)] AS Word3, 
  SPLIT(StringToParse)[SAFE_OFFSET (4)] AS Word4, 
  SPLIT(StringToParse)[SAFE_OFFSET (5)] AS Word5, 
  SPLIT(StringToParse)[SAFE_OFFSET (6)] AS Word6, 
  SPLIT(StringToParse)[SAFE_OFFSET (7)] AS Word7, 
  SPLIT(StringToParse)[SAFE_OFFSET (8)] AS Word8, 
  SPLIT(StringToParse)[SAFE_OFFSET (9)] AS Word9 
FROM 
  (SELECT event_list AS StringToParse FROM `mytable.2017`)   

同时,如果由于某种原因您必须在此查询中使用正则表达式 - 请尝试以下操作

#standardSQL
SELECT  
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(0)]  AS Word0,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(1)]  AS Word1,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(2)]  AS Word2,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(3)]  AS Word3,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(4)]  AS Word4,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(5)]  AS Word5,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(6)]  AS Word6,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(7)]  AS Word7,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(8)]  AS Word8,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(9)]  AS Word9
FROM
  (SELECT event_list AS StringToParse FROM `mytable.2017`)  

当然,在以上所有示例中,您可以通过引入 REGEXP_EXTRACT_ALL 的 SPLIT 的子查询来简化代码,然后只需在外部选择中选择每个数组的元素

【讨论】:

Dangit,你打败了我哈哈。从我的手机输入需要很长时间:) 我会留下我的答案,因为根据字符串的大小和行数,只计算一次数组可能会更快。 当然。反正你是最棒的,我是你最大的粉丝! :o) 而且在笔记本电脑上打字肯定比在手机上打字要快 :o) @ElliottBrossard 和 Mikhail,很难及时得到答案!他通常会击败所有人。我倾向于只捡起留下的碎片 :) Mikhail - 你是否为 BigQuery 上的新问题设置了某种通知?!【参考方案2】:

您可以只使用 SPLIT 函数。例如,

SELECT
  parts[SAFE_OFFSET(0)] AS Word0,
  parts[SAFE_OFFSET(1)] AS Word1,
  parts[SAFE_OFFSET(2)] AS Word2,
  parts[SAFE_OFFSET(3)] AS Word3,
  parts[SAFE_OFFSET(4)] AS Word4,
  parts[SAFE_OFFSET(5)] AS Word5,
  parts[SAFE_OFFSET(6)] AS Word6,
  parts[SAFE_OFFSET(7)] AS Word7,
  parts[SAFE_OFFSET(8)] AS Word8,
  parts[SAFE_OFFSET(9)] AS Word9
FROM (
  SELECT SPLIT(event_list, ',') AS parts
  FROM `mytable.2017`
);

【讨论】:

以上是关于正则表达式逗号分隔分隔符的主要内容,如果未能解决你的问题,请参考以下文章

优化逗号分隔值正则表达式

用于逗号分隔文本的 Mysql 正则表达式

C# 正则表达式在 外用逗号分隔

通过正则表达式获取逗号分隔的数字

逗号分隔的列表正则表达式 [重复]

用逗号作为小数分隔符的数字的 Google 表单正则表达式