将带有html标签的列值转换为带有行和列的sql视图
Posted
技术标签:
【中文标题】将带有html标签的列值转换为带有行和列的sql视图【英文标题】:Convert column value with html tags into sql view with rows and columns 【发布时间】:2020-07-17 06:57:05 【问题描述】:我有一个名为 data 的表,其中包含 desc_data 列。 该列的值如下:
<span class ="label">A</span><br> <span class ="value">A-Class</span> <span class ="label">B</span><br> <span class ="value">B-Class</span>.
我想解析此列值,剥离 html 标记并使用 sql 查询(可能是 Regexp_Replace)将其拆分到一个新视图中,这样: 所有标签值都成为列,即
<span class ="label"> A
& <span class ="label">B
将成为列,并且
<span class ="value">A-Class
& <span class ="value">B-Class
将分别成为列值。
实际数据更多,包含许多标签和值,但这只是获取帮助的示例。 预期的结果应该是:
查看数据_查看
A B
A-Class B-Class
【问题讨论】:
【参考方案1】:我认为将所需数据作为行而不是列获取会更方便。
您可以使用 xmltable 解析它,只需对原始 html 稍作修改(删除像 <br>
这样的未封闭标签。这就是为什么 <br/>
更好):
with t as (
-- your sample data:
select
q'[<span class ="label">A</span><br> <span class ="value">A-Class</span> <span class ="label">B</span><br> <span class ="value">B-Class</span>.
]' html_data
from dual
)
-- main query:
select xt.*
from t
,xmltable(
'let $labels := /root/span[@class eq "label"]
let $values := /root/span[@class eq "value"]
for $label at $i in $labels
return element label
attribute name $label/text(),
attribute value $values[$i]/text()
'
passing
xmltype(
--- modify your html to make it compatible with xml:
'<root>'
|| replace(replace(t.html_data,'<br>'),' ')
||'</root>'
)
columns
n for ordinality,
label_name path '@name',
label_value path '@value'
) xt;
结果:
N LABEL_NAME LABEL_VALUE
---------- ------------------------------ ------------------------------
1 A A-Class
2 B B-Class
【讨论】:
让我们continue this discussion in chat。 嗨,Sayan,当我检查更大的真实数据时。有时它会抛出错误。例如:当标签类似于 DomainProduction 项目团队 Asdsh jsdja kajdhjahdja Grueber,这里是标签域而不是值 Production,它需要下一个标签的值,即Asdsh jsdja kajdhjahdja Grueber 应该与项目团队对抗 基本上是一种修改 let $values := /root/span[@class eq "desc_value"] 的方法,使其也包含 let $values := /root/span[@class eq "选择 desc_value"]【参考方案2】:您需要通过某种模式(例如'/span> <span'
)递归地拆分您的字符串。使用REGEXP_REPLACE()
函数提取所需的列,然后应用透视:
WITH t(desc_data) AS
(
SELECT '<span class ="label">A</span><br> <span class ="value">A-Class</span> <span class ="label">B</span><br> <span class ="value">B-Class</span> <span class ="label">C</span><br> <span class ="value">C-Class</span>'
FROM dual
), t2 AS
(
SELECT SUBSTR(desc_data,1,CASE WHEN INSTR(desc_data,'/span> <span',1,level) > 0
THEN INSTR(desc_data,'/span> <span',1,level) + 5
ELSE LENGTH(desc_data)
END
) AS desc_data2
FROM t
CONNECT BY level <= REGEXP_COUNT(desc_data,'/span> <span') + 1
)
SELECT *
FROM
(
SELECT REGEXP_REPLACE(desc_data2,'(.*"label">)(\S+)(</span>.*)','\2') AS label,
REGEXP_REPLACE(desc_data2,'(.*"value">)(\S+)(</span>.*)','\2') AS value
FROM t2 )
PIVOT ( MAX(VALUE) FOR LABEL IN ('A' AS "A", 'B' AS "B", 'C' AS "C") );
A B C
------- ------- -------
A-Class B-Class C-Class
Demo
【讨论】:
让我们continue this discussion in chat.以上是关于将带有html标签的列值转换为带有行和列的sql视图的主要内容,如果未能解决你的问题,请参考以下文章