如何使用 spark 过滤 Hive 中的记录

Posted 2023-04-18

技术标签:

【中文标题】如何使用 spark 过滤 Hive 中的记录【英文标题】：How to filter records in Hive using spark 【发布时间】：2017-06-29 11:00:14 【问题描述】：

为什么没有比较刺痛？

我的输入是 -

+-------+
|      y|
+-------+
| ""no""|
| ""no""|
| ""no""|
|""yes""|
| ""no""|
| ""no""|
| ""no""|
| ""no""|
|""yes""|
| ""no""|
| ""no""|
| ""no""|
| ""no""|
|""yes""|
| ""no""|
| ""no""|
+-------+

我正在查询-

sqlContext.sql("select count(y) from dummy where y='yes'").show()

输出是 -

+---+
|_c0|
+---+
|  0|
+---+

y 在 DDL 中被声明为字符串类型

【问题讨论】：

早应该使用.replaceAll("\"\"", "") :D 【参考方案1】：

你应该试试这个：

sqlContext.sql("select count(y) from dummy where y='\"\"yes\""'").show()

请注意，您的数据有 ""yes"" 而不仅仅是 yes。

您仍然需要清理数据 :)

或者这样做：

sqlContext.sql("select count(y) from dummy where y like '%yes%'").show()

【讨论】：

你可以ACCEPT回答或投票，如果这符合问题，并且在你的情况下。

以上是关于如何使用 spark 过滤 Hive 中的记录的主要内容，如果未能解决你的问题，请参考以下文章