SQL基础操作_5_字符串处理
Posted ShenLiang2025
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了SQL基础操作_5_字符串处理相关的知识,希望对你有一定的参考价值。
目录
7.6 处理字符串
注:数据集和表结构见 SQL基础操作_1_检索数据
7.6.1 生成自增值
需求:通过SQL生成一个1到1000条记录.
解决方法:通过CTE的递归来实现该需求.
SQLServer:
DECLARE @startINT, @endINT
SELECT @start=1, @end=1000
;WITH NumberSequence( Number) AS
(
SELECT @start ASNumber
UNION ALL
SELECT Number + 1
FROM NumberSequence
WHERE Number <@end
)
SELECT * FROM NumberSequence OPTION (MaxRecursion 1000)
执行结果:
Number |
1 |
2 |
3 |
... |
|
Oracle:
WITH t(num) AS (
SELECT 1 FROM DUAL
UNION ALL
SELECT t.num+1
FROM t WHERE t.num<100
)
SELECT * FROM t;
执行结果:
Num |
1 |
2 |
3 |
... |
|
Mysql(8.0及以上版本):
WITH RECURSIVE cte (num) AS
(
SELECT 1
UNION ALL
SELECT num+1 FROM cte WHERE num <100
)
SELECT * FROM cte;
执行结果:
Num |
1 |
2 |
3 |
|
… |
7.6.2 遍历字符串里的每个值
需求:打印出ename为’King’的名字里每一个字母,每个字母占一行.
解决方法:通过自增表和emp表先cross join(笛卡尔积),然后再通过ename的len(ename的长度)进行过滤,最终得到显示每个字母的结果.
SQLServer:
SELECT SUBSTRING (e.ENAME,seq.pos,1) AS ename_Split
FROM (SELECT ENAME FROM emp WHERE ename= 'KING' ) e,
(SELECT number AS pos FROM master.[dbo].[spt_values] WHERE type = 'P' AND number>0)seq
WHERE seq.pos <= LEN(e.ename)
执行结果:
ename_Split |
K |
I |
N |
G |
注:
1: 这里master.[dbo].[spt_values]是一张特殊的系统视图,里面存了从0到2047总2048条自增序列.
2: 如果不明白,可以分段来看.
Step1:
SELECT e.*,seq.* FROM
(SELECT ENAME FROM emp WHERE ename= 'KING' ) e,
(SELECT number AS pos FROM master.[dbo].[spt_values] WHERE type = 'P' AND number>0)seq
执行结果:
ENAME | pos |
KING | 1 |
KING | 2 |
KING | 3 |
| |
… | … |
Sept2:
SELECT SUBSTRING(e.ENAME,seq.pos,1)AS ename_Split
FROM .. e, ..seq
WHERE seq.pos<=LEN(e.ename)
这里通过SUBSTRNG函数,每次的开始位置不断调整,每次仅取一个字符,再通过LEN函数过滤.所以得到最终结果.如果不熟悉SUBSTRING的语法,这里简单介绍下.
SUBSTRING ( expression, start, length )
1) 参数expression是要截取的原始字符串,比如这里的” KING”
2) 参数start是要截取的位置,比如从第2个位置开始,那应该从” I”往后数.
3) 参数length是要截取的长度,沿用上一行的例子,如果长度定义为2,则最终截取字符串是”IN”
Oracle:
WITH t(num) AS (
SELECT 1 FROM DUAL
UNION ALL
SELECT t.num+1
FROM t WHEREt.num<100
)
SELECT SUBSTR(e.ENAME,seq.num,1) AS ename_Split FROM
(SELECT ENAME FROM emp WHERE ename ='KING' ) e,
(SELECT num FROM t)seq
WHERE seq.num <= LENGTH(e.ename)
Mysql 8.0:
WITH RECURSIVE cte (num)AS
(
SELECT 1
UNION ALL
SELECT num+1 FROM cte WHERE num <100
)
SELECT SUBSTRING(e.ENAME,seq.num,1) AS ename_Split FROM
(SELECT ENAME FROM emp WHERE ename ='KING' ) e,
(SELECT num FROM cte)seq
WHERE seq.num <= LENGTH(e.ename)
7.6.3 处理含引号的字符串
需求:往dept表里插入dname为Test’s,loc为Beijing,deptno为100的数据.
解决方法:这里有位引号是特殊符号,所以需要特殊处理,比如如果双引号包裹起来.
Mysql:
INSERT INTO dept VALUES(100,'Test\\'s','Beijing');
Sql Server:
BEGIN TRAN
SET IDENTITY_INSERTdeptON;
GO
INSERT INTO dept(deptno,dname,loc) VALUES (100,'Test''s','Beijing');
SELECT * FROM dept WHERE deptno=100;
SET IDENTITY_INSERTdeptOFF;
GO
ROLLBACK TRAN
执行结果:
deptno | dname | loc |
100 | Test's | Beijing |
7.6.4 计算某个字符出现的次数
需求:查询emp表emptno是7499的用户的job里S出现的次数.
解决方法:这里length(len)结合replace函数算出字符串出现的次数.
Sql Server:
SELECT empno,job,(LEN(JOB) - LEN(REPLACE(JOB,'S','')))/LEN('S') AS StrFreq
FROM emp WHERE empno=7499;
empno | job | StrFreq |
7499 | SALESMAN | 2 |
Mysql:
SELECT empno,job,ROUND((LENGTH(JOB) - LENGTH(REPLACE(JOB,'S','')))/LENGTH('S')) AS StrFreq
FROM emp WHERE empno=7499;
注:这里除以LENGTH('S')是为了考虑传入的字符串是2位以及以上的情况,比如’SS’.
7.6.5 字符串里过滤不需要的字符
需求:过滤tmp_v视图里含数字的部分. 其中tmp_v视图的data字段的定义是emp表的ename字段拼接空格和deptno字段。
解决方法:这里通过translate函数对含数字的部分进行替换.
SQL Server:
create view tmp_v
AS SELECT ename+' '+cast(deptno as varchar)as data
from emp
SELECT data,replace(dbo.translate(data,'0123456789','@@@@@@@@@@'),'@','') as ename FROM tmp_v
order by replace(dbo.translate(data,'0123456789','@@@@@@@@@@'),'@','') desc
data | ename |
WARD 30 | WARD |
TURNER 30 | TURNER |
SMITH 20 | SMITH |
SCOTT 20 | SCOTT |
MILLER 10 | MILLER |
MARTIN 30 | MARTIN |
KING 10 | KING |
JONES 20 | JONES |
JAMES 30 | JAMES |
FORD 20 | FORD |
CLARK 10 | CLARK |
BLAKE 30 | BLAKE |
ALLEN 30 | ALLEN |
ADAMS 20 | ADAMS |
注:这里需要参考之前章节里translate函数的实现.
Mysql:
create view tmp_v
AS SELECT CONCAT(ename,' ',deptno) as data
from emp
SELECT data,replace(translate(data,'0123456789','@@@@@@@@@@'),'@','') as ename
FROM tmp_v
order by replace(translate(data,'0123456789','@@@@@@@@@@'),'@','') desc
Oracle:
7.6.6 拆分字符串里的字符和数字
需求:过滤tmp_v视图里data字段拆分会原来的ename和deptno两个字段.
解决方法:这里通过translate、replace、repeate(replicate、rpad)函数对含数字的部分进行替换.
SQL Server:
SELECT data,replace(dbo.translate(data,'0123456789',REPLICATE('@',10)),'@','') as ename,
replace(dbo.translate(data,'ABCDEFGHIJKLMNOPQRSTUVWXYZ',REPLICATE('@',26)),'@','') as deptno
FROM tmp_v
order by replace(dbo.translate(data,'0123456789',REPLICATE('@',10)),'@','') desc
data | ename | deptno |
WARD 30 | WARD | 30 |
TURNER 30 | TURNER | 30 |
SMITH 20 | SMITH | 20 |
SCOTT 20 | SCOTT | 20 |
MILLER 10 | MILLER | 10 |
MARTIN 30 | MARTIN | 30 |
KING 10 | KING | 10 |
JONES 20 | JONES | 20 |
JAMES 30 | JAMES | 30 |
FORD 20 | FORD | 20 |
CLARK 10 | CLARK | 10 |
BLAKE 30 | BLAKE | 30 |
ALLEN 30 | ALLEN | 30 |
ADAMS 20 | ADAMS | 20 |
Mysql:
SELECT data,replace(translate(data,'0123456789',REPEAT('@',10)),'@','') as ename,
replace(translate(data,'ABCDEFGHIJKLMNOPQRSTUVWXYZ',REPEAT('@',26)),'@','') as deptno
FROM tmp_v
order by replace(translate(data,'0123456789',REPEAT('@',10)),'@','') desc
7.6.7 判断字符串是字符串数字型
需求:检索temp_strdata表的字段data是字符串数字类型的记录这里如果都是字符串或者数字的也符合条件.
解决方法:这里通过translate、replace、repeate(replicate、rpad)函数对含数字的部分进行替换.
SqlServer:
CREATE TABLE temp_str(data VARCHAR(1000));
INSERT INTO temp_str VALUES('SMITH20');
INSERT INTO temp_str VALUES('JONES30');
INSERT INTO temp_str VALUES('Jim#40');
INSERT INTO temp_str VALUES('50$Tom');
INSERT INTO temp_str VALUES('60:Mike');
INSERT INTO temp_str VALUES('70Cruz');
INSERT INTO temp_str VALUES('Jack');
INSERT INTO temp_str VALUES('J8oh0n');
SELECT data --,dbo.translate(UPPER(data),'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',REPLICATE('a',36)) as trans
FROM temp_str
WHERE dbo.translate(UPPER(data),'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',REPLICATE('a',36))=REPLICATE('a',LEN(data))
7.6.8 判断字符串含有汉字
需求:检索含有汉字的字符串.
解决方法:这里通过函数CHAR_LENGTH对比LENGTH进行对比来判断.
Mysql:
SELECT data,LENGTH(data) AS Len_data,CHAR_LENGTH(data) AS CharLen_Data,HEX(data) AS HexData FROM
(
SELECT 'Hello,World,SQL'AS data
UNION ALL
SELECT 'Data,Arithmetic' AS data
UNION ALL
SELECT 'Science中国' AS data
UNION ALL
SELECT '上S海H' AS data
)A
WHERE LENGTH(data) <> CHAR_LENGTH(data)
执行结果:
data | Len_data | CharLen_Data |
Science中国 | 13 | 9 |
上S海H | 8 | 4 |
注:
LENGTH() returnsthe length of the string measured in bytes.
CHAR_LENGTH() returns the length of the string measured incharacters.
LENGTH:是计算字节的长度.一个汉字是算三个字符,一个数字或字母算一个字符
CHAR_LENGTH:汉字、数字、字母都算是一个字符
或者通过字符串的十六进制并结合REGEXP来判断.
SELECT data,HEX(data) AS HexData FROM
(
SELECT 'Hello,World,SQL' AS data
UNION ALL
SELECT 'Data,Arithmetic'AS data
UNION ALL
SELECT 'Science中国' AS data
UNION ALL
SELECT '上S海H' AS data
)A
WHERE HEX(data) REGEXP'^(..)*(E[4-9])'
执行结果:
data | HexData |
Science中国 | 536369656E6365E4B8ADE59BBD |
上S海H | E4B88A53E6B5B748 |
SQL Server:
SELECT data FROM
(
SELECT 'Hello,World,SQL' AS data
UNION ALL
SELECT 'Data,Arithmetic' AS data
UNION ALL
SELECT 'Science中国' AS data
UNION ALL
SELECT '上S海H' AS data
)A
WHERE data LIKE '%[吖-座]%'
-- 或者利用PATINDEX函数进行判断
SELECT data FROM
( SELECT 'Hello,World,SQL' AS data
UNION ALL
SELECT 'Data,Arithmetic' AS data
UNION ALL
SELECT 'Science中国' AS data
UNION ALL
SELECT '上S海H' AS data
)A
WHERE PATINDEX ('%[吖-座]%',data) > 0
执行结果:
data |
Science中国 |
上S海H |
7.6.9 合并多行到一行
需求:将emp表里deptno相同的ename以逗号拼接在一起.
解决方法:这里通过字符串合并函数完成该效果.如group_concat
SQL Server:
SELECT DISTINCT
deptno
, STUFF((
SELECT N', '+ CAST([ename]ASVARCHAR(255))
FROM emp e1
WHERE e1.deptno= e2.deptno
FOR XML PATH ('')), 1, 2,'')AS StrCombine
FROM emp e2
执行结果:
Deptno | StrCombine |
10 | CLARK, KING, MILLER |
20 | SMITH, JONES, SCOTT, ADAMS, FORD |
30 | ALLEN, WARD, MARTIN, BLAKE, TURNER, JAMES |
或者借助CTE:
WITH x(deptno,cnt,list,empno,len)
AS(
SELECT deptno,COUNT(*) OVER(PARTITIONBY deptno),CAST(ename AS VARCHAR(100)),empno,1
FROM emp
UNION ALL
SELECT x.deptno,x.cnt,CAST(x.list+','+e.ename AS VARCHAR(100)),e.empno,x.len+1
FROM emp e,x
WHERE e.deptno= x.deptno
AND e.empno> x.empno
)
SELECT deptno,list AS StrCombine from X
WHERE len=cnt
ORDER BY 1
步骤分析:
Step1: 首先借助COUNT(*) OVER(PARTITION BY deptno)确定每个部门里有多少员工
Step2: 借助len,初始值为1,完成自增(递归里 x.len+1)
Step3: 借助x.list+','+e.ename完成ename的拼接
Step4: 借助e.deptno = x.deptno AND e.empno >x.empno确定要递归的结果集
Step5: 查询满足条件的记录,即按deptno分组的empno数和自增序号相同的记录.
分步查看结果:
1) 查看构建的递归数据.这里以depto=10的为例:
SELECT * FROM x WHERE deptno= 10
deptno | cnt | list | empno | len |
10 | 3 | CLARK | 7782 | 1 |
10 | 3 | KING | 7839 | 1 |
10 | 3 | MILLER | 7934 | 1 |
10 | 3 | KING,MILLER | 7934 | 2 |
10 | 3 | CLARK,KING | 7839 | 2 |
10 | 3 | CLARK,MILLER | 7934 | 2 |
10 | 3 | CLARK,KING,MILLER | 7934 | 3 |
2) 不难发现,这里高亮处的数据是我们想要的,所以通过如下方式获取最终结果:
SELECT deptno,list AS StrCombine FROM X
WHERE len=cnt
ORDER BY 1
延展阅读:如果仅想获得某一个分组下的字符串合并,也可以按照如下方法:
DECLARE @combinedString VARCHAR(MAX)
SELECT @combinedString = COALESCE(@combinedString+', ','')+ ename
FROM emp
WHERE deptno=10
SELECT @combinedString as StringValue
Mysql:
SELECT deptno,group_concat(ename) AS StrCombine FROM emp
GROUP BY deptno
ORDER BY emp.deptno,ename
7.6.10 对字符串重新按字母排序重新组合
需求:将emp表里ename按照字母顺序重新组合生成新的字符.
解决方法:这里通过字符串合并函数或者结合substring和row_number完成该效果.
SqlServer:
WITH x(ename,ename_Split) AS
(
SELECT TOP 100000 ename,SUBSTRING(e.ENAME,seq.pos,1) AS ename_Split
FROM (SELECT ENAME FROM emp) e,
(SELECT number AS pos FROM master.[dbo].[spt_values] WHERE type= 'P' AND number>0) seq
WHERE seq.pos <= LEN(e.ename)
ORDER BY ename, ename_Split
)
SELECT DISTINCT
ename
, STUFF((
SELECT N''+ CAST(ename_SplitASVARCHAR(255))
FROM x e1
WHERE e1.ename= e2.ename
FOR XML PATH ('')),1,0,'')AS StrByAlph
FROM x e2
注: 如果想在CTE里使用[]排序,需要在查询里指定TOP.
消息 1033,级别 15,状态 1,第 67 行
除非另外还指定了 TOP、OFFSET 或 FORXML,否则,ORDER BY 子句在视图、内联函数、派生表、子查询和公用表表达式中无效。
执行结果:
ename | StrByAlph |
ADAMS | AADMS |
ALLEN | AELLN |
BLAKE | ABEKL |
CLARK | ACKLR |
FORD | DFOR |
JAMES | AEJMS |
JONES | EJNOS |
KING | GIKN |
MARTIN | AIMNRT |
MILLER | EILLMR |
SCOTT | COSTT |
SMITH | HIMST |
TURNER | ENRRTU |
WARD | ADRW |
|
|
Mysql:
SELECT ename,group_concat(SUBSTRING(e.ENAME,seq.num,1) ORDER BY SUBSTRING(e.ENAME,seq.num,1) separator '') AS StrByAlph
FROM (SELECT ENAME FROM emp) e,
(SELECT i AS num FROM tb_incr)seq
WHERE seq.num <= LENGTH(e.ename)
GROUP BY ename
ORDER BY ename,SUBSTRING(e.ENAME,seq.num,1)
注: 这里借助group_concat函数里的ORDER BY关键字,对已经排序的字母进行合并.
这里tb_incr是个自增表,存放从1开始步长为1的自增序列
Oracle:
7.6.11 判断一个字符是否是数字
需求:将临时表里判断data字段里哪些是数字.
解决方法:这里通过函数isnumberic或者regexp完成该效果.
Mysql:
DELIMITER $$
DROP FUNCTION IF EXISTS `IsNum` $$
CREATE FUNCTION `IsNum`(str VARCHAR(25)) RETURNS INT
BEGIN
DECLARE iResult INT DEFAULT 0;
IF ISNULL(str) THEN return 0;END IF;-- NULL 字符串
IF str='' THEN return 0;END IF;-- 空字符串
SELECT str REGEXP '^[0-9]*$' INTO iResult;
IF iResult =1 THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
END $$
DELIMITER ;
或者使用正则表达式:
SELECT data FROM
(
SELECT '63' AS data
UNION ALL
SELECT '36(' AS data
UNION ALL
SELECT '3(6' AS data
UNION ALL
SELECT '(36' AS data
UNION ALL
SELECT '36$' AS data
UNION ALL
SELECT '' AS data
UNION ALL
SELECT NULL AS data
)A
-- WHERE IsNum(data) = 1
WHERE data REGEXP '^[0-9]*$'=1 AND data IS NOT NULL AND data<> '';
或者直接通过函数IsNum(data) = 1来判断,见注释部分.
SqlServer:
SELECT data FROM
(
SELECT '63' AS data
UNION ALL
SELECT '36(' AS data
UNION ALL
SELECT '3(6' AS data
UNION ALL
SELECT '(36' AS data
UNION ALL
SELECT '36$'AS data
UNION ALL
SELECT '' AS data
UNION ALL
SELECT NULL AS data
)A
WHERE ISNUMERIC(data)= 1
执行结果:
Data |
63 |
7.6.12 按照指定的位置截取字符
需求:按照逗号拆分字符串,取拆分出来的第二个子串.
解决方法:这里需要自定义函数结合substring截取字符串,以达到该效果.
SQL Server:
CREATE FUNCTION strSplitIndex
( @str VARCHAR(1024), --要分割的字符串
@split VARCHAR(10), --分隔符
@index INT --要取元素的位置
)
RETURNS VARCHAR(1024)
AS
BEGIN
DECLARE @location INT
DECLARE @start INT
DECLARE @next INT
DECLARE @seed INT
SET @str=LTRIM(RTRIM(@str))
SET @start=1
SET @next=1
SET @seed=LEN(@split)
SET @location=CHARINDEX(@split,@str)
WHILE @location<>0and @index>@next
BEGIN
SET @start=@location+@seed
SET @location=CHARINDEX(@split,@str,@start)
SET @next=@next+1
END
IF @location =0 SELECT @location =LEN(@str)+1
RETURN SUBSTRING(@str,@start,@location-@start)
END
GO
SELECT dbo.strSplitIndex(data,',',1) AS StrSplit FROM
(
SELECT 'Hello,World,SQL' AS data
UNION ALL
SELECT 'Data,Arithmetic' AS data
UNION ALL
SELECT 'Science' AS data
)A
执行结果:
StrSplit |
Hello |
Data |
Science |
或者借助parsename函数:
SELECT PARSENAME(REPLACE(data,',','.'),2) AS StrSplit FROM
(
SELECT 'Hello,World,SQL' AS data
UNION ALL
SELECT 'Data,Arithmetic' AS data
UNION ALL
SELECT 'Science' AS data
)A
WHERE PARSENAME(REPLACE(data,',','.'),2)IS NOT NULL
执行结果:
StrSplit |
Hello |
Data |
Mysql:
SELECT data,SUBSTRING_INDEX(SUBSTRING_INDEX(data,',',seq.num),',',-1) AS sub,seq.num AS subStrPos
FROM
(SELECT 'Hello,World,SQL' AS data
UNION ALL
SELECT 'Data,Arithmetic' AS data
UNION ALL
SELECT 'Science' AS data) e,
(SELECT ias num FROM tb_incr)seq
WHERE seq.num <= LENGTH(e.data)- LENGTH(REPLACE(e.data,',',''))+1
AND seq.num=2
ORDER BY data,seq.num
执行结果:
data | sub | subStrPos |
Data,Arithmetic | Arithmetic | 2 |
Hello,World,SQL | World | 2 |
步骤解析:
Step1: 首先借助自增表将data字段里的数据按照逗号的数目切分,如果有2个逗号,则会切分成3部分
Step2: 借助SUBSTRING_INDEX函数截取逗号所在位置的子串,这里鉴于SUBSTRING_INDEX的第三个参数的意义是子串累加,所以又套了个SUBSTRING_INDEX,第三个参数传-1,即从右边截取.
Step3:借助自增表的num,取指定分割位置的数据,这里是2.
注: SUBSTRING_INDEX函数执行示例见下:
SELECT SUBSTRING_INDEX('Hello,World,SQL',',',1) AS SUBSTRING,1 AS pos
UNION ALL
SELECT SUBSTRING_INDEX('Hello,World,SQL',',',2) AS SUBSTRING,2 AS pos
UNION ALL
SELECT SUBSTRING_INDEX('Hello,World,SQL',',',3) AS SUBSTRING,3 AS pos
执行结果:
SUBSTRING | pos |
Hello | 1 |
Hello,World | 2 |
Hello,World,SQL | 3 |
Oracle:
7.6.13 按照指定的分隔符截取字符返回表形式
需求:按照逗号拆分字符串,并指定返回的格式是表.
解决方法:这里需要自定义函数结合substring截取字符串,以达到该效果.
SQLServer:
CREATE FUNCTION strSplitTable(@strNVARCHAR(2000),@splitNVARCHAR(2))
RETURNS @t TABLE(SubStr VARCHAR(1000))
AS
BEGIN
DECLARE @tmpSubStr VARCHAR(1000),@getIndexINT
SET @getIndex=CHARINDEX(',',@str)
WHILE(@getIndex<>0)
BEGIN
SET @tmpSubStr=CONVERT(VARCHAR(1000),SUBSTRING(@str,1,@getIndex-1))
INSERT INTO @t(SubStr)VALUES(@tmpSubStr)
SET @str=STUFF(@str,1,@getIndex,'')
SET @getIndex=CHARINDEX(',',@str)
END
INSERT INTO @t(SubStr)VALUES(@str)
RETURN
END
GO
SELECT * FROM strSplitTable('Hello,World,SQL',',')
执行结果:
SubStr |
Hello |
World |
SQL |
注: 自Sql Server 2016已新增系统函数STRING_SPLIT,测试示例见下:
SELECT A.value value_A,B.value value_B
FROM
( SELECT value FROM STRING_SPLIT('A$B$C','$')
)A
LEFT JOIN
(SELECT value FROM STRING_SPLIT('A,B',',')
)B
ON A.value = B.value
执行结果:
value_A | value_B |
A | A |
B | B |
C | NULL |
这里仅支持单个字符的分隔符.
消息 214,级别 16,状态 11,第 12 行
Procedure expects parameter 'separator' of type 'nchar(1)/nvarchar(1)'.
Mysql:
未见该方法.
以上是关于SQL基础操作_5_字符串处理的主要内容,如果未能解决你的问题,请参考以下文章