我们如何在 Hadoop 中截断空格后的文本？

Posted 2023-03-23

技术标签:

【中文标题】我们如何在 Hadoop 中截断空格后的文本？【英文标题】：How can we truncate text after space in Hadoop? 【发布时间】：2019-11-19 23:34:35 【问题描述】：

我有一列说column_1，它的值是：

abc 12edf
hbnm 847
47sf hg41

我需要如下输出：

abc
hbnm
47sf

PS：我对数据库有只读权限

【问题讨论】：

【参考方案1】：

使用regexp_extract(col,'^(.*?)\\s',1) 提取正则表达式中空格（第 1 组）之前字符串开头的所有内容。

'^(.*?)\\s' 表示：

^- 字符串锚的开始 (.*?) - 任意字符任意次数 \\s - 空格

Demo:
with your_table as (--use your_table instead of this
select stack (3,
'abc  12edf',
'hbnm 847',
'47sf hg41'
) as str
)

select regexp_extract (str,'^(.*?)\\s',1) as result_str 
  from your_table s

结果：

abc
hbnm
47sf

另一种可能的解决方案是使用split：

select split (str,' ')[0] as result_str

还有一种使用instr+ substr的解决方案：

select substr(str,1,instr(str,' ')-1)

【讨论】：

以上是关于我们如何在 Hadoop 中截断空格后的文本？的主要内容，如果未能解决你的问题，请参考以下文章