正则表达式逗号分隔分隔符
Posted
技术标签:
【中文标题】正则表达式逗号分隔分隔符【英文标题】:regex comma separated delimiter 【发布时间】:2017-12-05 03:43:22 【问题描述】:我正在尝试用逗号分隔符拆分我的列。因此该列具有多个值,例如; 139,239,338,323。出于某种原因,以下代码适用于第一列,但其余列为空。
SELECT
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)0([^,\/]*),\/?') as Word0,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)1([^,\/]*),\/?') as Word1,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)2([^,\/]*),\/?') as Word2,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)3([^,\/]*),\/?') as Word3,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)4([^,\/]*),\/?') as Word4,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)5([^,\/]*),\/?') as Word5,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)6([^,\/]*),\/?') as Word6,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)7([^,\/]*),\/?') as Word7,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)8([^,\/]*),\/?') as Word8,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/)9([^,\/]*),\/?') as Word9
FROM
(SELECT event_list AS StringToParse FROM `mytable.2017`)
【问题讨论】:
好吧,所有奇怪的转义正斜杠是怎么回事。使模式不可读。如何匹配逗号而不是非逗号?(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?)
【参考方案1】:
试试下面的 BigQuery 标准 SQL
#standardSQL
SELECT
SPLIT(StringToParse)[SAFE_OFFSET (0)] AS Word0,
SPLIT(StringToParse)[SAFE_OFFSET (1)] AS Word1,
SPLIT(StringToParse)[SAFE_OFFSET (2)] AS Word2,
SPLIT(StringToParse)[SAFE_OFFSET (3)] AS Word3,
SPLIT(StringToParse)[SAFE_OFFSET (4)] AS Word4,
SPLIT(StringToParse)[SAFE_OFFSET (5)] AS Word5,
SPLIT(StringToParse)[SAFE_OFFSET (6)] AS Word6,
SPLIT(StringToParse)[SAFE_OFFSET (7)] AS Word7,
SPLIT(StringToParse)[SAFE_OFFSET (8)] AS Word8,
SPLIT(StringToParse)[SAFE_OFFSET (9)] AS Word9
FROM
(SELECT event_list AS StringToParse FROM `mytable.2017`)
您可以使用下面的虚拟数据测试/玩上面的内容
#standardSQL
WITH `mytable.2017` AS (
SELECT '139,239,338,323' AS event_list UNION ALL
SELECT '123,456,789,135'
)
SELECT
SPLIT(StringToParse)[SAFE_OFFSET (0)] AS Word0,
SPLIT(StringToParse)[SAFE_OFFSET (1)] AS Word1,
SPLIT(StringToParse)[SAFE_OFFSET (2)] AS Word2,
SPLIT(StringToParse)[SAFE_OFFSET (3)] AS Word3,
SPLIT(StringToParse)[SAFE_OFFSET (4)] AS Word4,
SPLIT(StringToParse)[SAFE_OFFSET (5)] AS Word5,
SPLIT(StringToParse)[SAFE_OFFSET (6)] AS Word6,
SPLIT(StringToParse)[SAFE_OFFSET (7)] AS Word7,
SPLIT(StringToParse)[SAFE_OFFSET (8)] AS Word8,
SPLIT(StringToParse)[SAFE_OFFSET (9)] AS Word9
FROM
(SELECT event_list AS StringToParse FROM `mytable.2017`)
同时,如果由于某种原因您必须在此查询中使用正则表达式 - 请尝试以下操作
#standardSQL
SELECT
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(0)] AS Word0,
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(1)] AS Word1,
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(2)] AS Word2,
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(3)] AS Word3,
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(4)] AS Word4,
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(5)] AS Word5,
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(6)] AS Word6,
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(7)] AS Word7,
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(8)] AS Word8,
REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(9)] AS Word9
FROM
(SELECT event_list AS StringToParse FROM `mytable.2017`)
当然,在以上所有示例中,您可以通过引入 REGEXP_EXTRACT_ALL 的 SPLIT 的子查询来简化代码,然后只需在外部选择中选择每个数组的元素
【讨论】:
Dangit,你打败了我哈哈。从我的手机输入需要很长时间:) 我会留下我的答案,因为根据字符串的大小和行数,只计算一次数组可能会更快。 当然。反正你是最棒的,我是你最大的粉丝! :o) 而且在笔记本电脑上打字肯定比在手机上打字要快 :o) @ElliottBrossard 和 Mikhail,很难及时得到答案!他通常会击败所有人。我倾向于只捡起留下的碎片 :) Mikhail - 你是否为 BigQuery 上的新问题设置了某种通知?!【参考方案2】:您可以只使用 SPLIT 函数。例如,
SELECT
parts[SAFE_OFFSET(0)] AS Word0,
parts[SAFE_OFFSET(1)] AS Word1,
parts[SAFE_OFFSET(2)] AS Word2,
parts[SAFE_OFFSET(3)] AS Word3,
parts[SAFE_OFFSET(4)] AS Word4,
parts[SAFE_OFFSET(5)] AS Word5,
parts[SAFE_OFFSET(6)] AS Word6,
parts[SAFE_OFFSET(7)] AS Word7,
parts[SAFE_OFFSET(8)] AS Word8,
parts[SAFE_OFFSET(9)] AS Word9
FROM (
SELECT SPLIT(event_list, ',') AS parts
FROM `mytable.2017`
);
【讨论】:
以上是关于正则表达式逗号分隔分隔符的主要内容,如果未能解决你的问题,请参考以下文章