编写一个 sql 查询以使用另一列的子字符串将列添加到数据框
Posted
技术标签:
【中文标题】编写一个 sql 查询以使用另一列的子字符串将列添加到数据框【英文标题】:Write a sql query to add column to dataframe with substring from another column 【发布时间】:2020-11-20 16:01:07 【问题描述】:我编写了一个 sql 查询,它带来了如下所示的 datafarame:
ID event date message
1 connection 2020-11-19 connection was garbage collected
2 connection 2020-11-19 In addition: Warnings
3 myid 2020-11-19 Value passed to replic (MYID= 806320110310:00, OLDID= 4289)
4 myid 2020-11-19 Value passed to replic (MYID= 349812948:00, OLDID= 1969)
5 warning 2020-11-19 Warning message
这是一个获取此信息的 sql 查询:
WITH
value LIKE '%connection was garbage%' AS connection,
value LIKE '%Value passed to replic (MYID%' AS myid,
value LIKE '%Warning message%' AS warning
SELECT DISTINCT
ID,
multiIf(
connection as 'connection',
myid as 'myid',
warning as 'warning',
NULL) as event,
date,
value as message
FROM my.data.frame
WHERE
connection
OR myid
OR warning
现在,我想添加另一列“MYID”,它等于事件 myid 消息中的值。期望的结果是:
ID event date message myid
1 connection 2020-11-19 connection was garbage collected NA
2 connection 2020-11-19 In addition: Warnings NA
3 myid 2020-11-19 Value passed to replic (MYID= 806320110310:00, OLDID= 4289) 06320110310:00
4 myid 2020-11-19 Value passed to replic (MYID= 349812948:00, OLDID= 1969) 349812948:00
5 warning 2020-11-19 Warning message NA
如何在我的 sql 查询中做到这一点?如何提取该数字值并将其放入新列中?
【问题讨论】:
请注意:每次选择的字符串处理都是非常糟糕的做法;考虑插入不是原始数据但已解析的数据(其中定义了事件类型、ID 等)或使用 Materialized View 在 ClickHouse 端进行。 【参考方案1】:SELECT
'Value passed to replic (MYID= 806320110310:00, OLDID= 4289)' AS s,
extract(s, '.*MYID= (.*),.*') AS myid
┌─s───────────────────────────────────────────────────────────┬─myid────────────┐
│ Value passed to replic (MYID= 806320110310:00, OLDID= 4289) │ 806320110310:00 │
└─────────────────────────────────────────────────────────────┴─────────────────┘
【讨论】:
以上是关于编写一个 sql 查询以使用另一列的子字符串将列添加到数据框的主要内容,如果未能解决你的问题,请参考以下文章