在 Impala DB 中创建表作为选择百分比子查询
Posted
技术标签:
【中文标题】在 Impala DB 中创建表作为选择百分比子查询【英文标题】:Create table as select percentage subquery in Impala DB 【发布时间】:2020-07-27 13:27:53 【问题描述】:我是 Impala 的新手,我需要使用选择结果集创建表,此外,此 sql 是使用 JDBC 在 Java 中运行的,请参阅下面的查询:
create table if not exists my_temp_table as select
41 as rule_id,49 as record_id,
(select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]9$' )/(select count(1) from dirty_table);
我需要创建表my_temp_table
并将数据插入到该表中,这是我需要运行的一条SQL。但它运行失败并给出如下错误:
[HY000][500051] [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error
经过检查,我知道 Impala 不支持SELECT
子句子查询,我们只能使用子查询
在 FROM
或 WHERE
子句中,请参阅 Impala 文档:https://impala.apache.org/docs/build/html/topics/impala_subqueries.html。
所以对于这个问题,我该如何解决这个问题。
我的想法:
-
更新sql让它执行,我试过
WITH
就像下面的sql,它可以工作但不能用于
CREATE TABLE ... AS ...
。
WITH q1 AS (
select count(1) as val from dirty_table where msg regexp '^[1]([3-9])[0-9]9$'
),
q2 AS (
select count(1) val2 from dirty_table
)
SELECT 100 * q1.val / q2.val2 result
FROM q1, q2
-
或者,mysql或Oracle中是否有类似
BEGIN ... END
的语句,那么我可以单独运行这个sql。
【问题讨论】:
【参考方案1】:通过您的示例,我会尝试这些方法,我相信这些方法可以正常工作。 我用 Impala 检查了解决方案
CREATE TABLE dirty_table (
id INT,
msg STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
[localhost.localdomain:21000] > SELECT * FROM dirty_table;
Query: SELECT * FROM dirty_table
Query submitted at: 2020-07-28 17:05:24 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=5441d6a46ce61e7b:8e49432600000000
+----+-------------+
| id | msg |
+----+-------------+
| 1 | 13321512121 |
| 2 | 13121212121 |
| 3 | 03121212121 |
| 4 | 13321512121 |
| 5 | 13121212121 |
| 6 | 03121212121 |
| 7 | 13121212121 |
+----+-------------+
Fetched 7 row(s) in 0.14s
第一个例子
CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]9$' ) a,
(SELECT COUNT(1) AS val2 FROM dirty_table) b;
[localhost.localdomain:21000] > CREATE TABLE IF NOT EXISTS my_temp_table AS
> SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
> FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]9$' ) a,
> (SELECT COUNT(1) AS val2 FROM dirty_table) b;
Query: CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT 41 AS rule_id, 49 AS record_id, val1 / val2 AS result
FROM (SELECT COUNT(1) AS val1 FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]9$' ) a,
(SELECT COUNT(1) AS val2 FROM dirty_table) b
+-------------------+
| summary |
+-------------------+
| Inserted 0 row(s) |
+-------------------+
Fetched 1 row(s) in 0.21s
[localhost.localdomain:21000] > invalidate metadata;
[localhost.localdomain:21000] > SELECT * FROM my_temp_table;
Query: select * from my_temp_table
Query submitted at: 2020-07-28 17:03:44 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=47370bf793a09b:29c4dfa000000000
+---------+-----------+--------------------+
| rule_id | record_id | result |
+---------+-----------+--------------------+
| 41 | 49 | 0.7142857142857143 |
+---------+-----------+--------------------+
Fetched 1 row(s) in 0.13s
第二个例子
DROP TABLE my_temp_table;
CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT result FROM
(WITH q1 AS (
SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]9$'
),
q2 AS (
SELECT COUNT(1) val2 FROM dirty_table
)
SELECT 100 * q1.val / q2.val2 AS result
FROM q1, q2) t;
[localhost.localdomain:21000] > CREATE TABLE IF NOT EXISTS my_temp_table AS
> SELECT result FROM
> (WITH q1 AS (
> SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]9$'
> ),
> q2 AS (
> SELECT COUNT(1) val2 FROM dirty_table
> )
> SELECT 100 * q1.val / q2.val2 AS result
> FROM q1, q2) t;
Query: CREATE TABLE IF NOT EXISTS my_temp_table AS
SELECT result FROM
(WITH q1 AS (
SELECT COUNT(1) AS val FROM dirty_table WHERE msg regexp '^[1]([3-9])[0-9]9$'
),
q2 AS (
SELECT COUNT(1) val2 FROM dirty_table
)
SELECT 100 * q1.val / q2.val2 AS result
FROM q1, q2) t
+-------------------+
| summary |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
Fetched 1 row(s) in 0.40s
[localhost.localdomain:21000] > invalidate metadata;
[localhost.localdomain:21000] > SELECT * FROM my_temp_table;
Query: SELECT * FROM my_temp_table
Query submitted at: 2020-07-28 17:08:17 (Coordinator: http://localhost.localdomain:25000)
Query progress can be monitored at: http://localhost.localdomain:25000/query_plan?query_id=3447684ef59d0c4:f70779200000000
+-------------------+
| result |
+-------------------+
| 71.42857142857143 |
+-------------------+
Fetched 1 row(s) in 0.74s
【讨论】:
嗨@Chema,和其他人回答一样,这两个SQL仅在SELECT
子句中有效,在添加CREATE TABLE
后,SQL无法工作。
添加异常日志(两者都报同样的错误):[HY000][500051] [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error in line 1: CREATE TABLE IF NOT EXISTS my_temp_table AS ^ Encountered: EOF Expected: SELECT, VALUES, W ...
嗨@KD Final,我更改了解决方案,请立即查看。
嗨@KD Final,我用Cloudera
分发和Impala
检查了解决方案,它工作正常。我用分步解决方案更改了帖子。也许您正面临其他问题。问候。
这种通用方法是正确的——将子查询放在 FROM 子句中,然后在选择列表中引用它们。在即将到来的 Impala 4.0(和 Impala 的其他 Cloudera 版本)中,我们确实支持选择列表子查询。在内部,它们被重写为完全像这样的查询。【参考方案2】:
我认为条件平均可以简单高效地完成您想要的操作,只需一次表扫描:
select avg(case when msg regexp '^[1]([3-9])[0-9]9$' then 100.0 else 0 end) result
from dirty_table
您可以将其转换为create table
声明:
create table my_temp_table as
select avg(case when msg regexp '^[1]([3-9])[0-9]9$' then 100.0 else 0 end) result
from dirty_table
【讨论】:
嗨@GMB,测试您的SQL,第一个SELECT
子句有效,但第二个CREATE TABLE
无效。 Cloudera Impala
有很多限制,官方文档中没有详细说明。
添加异常日志:[HY000][500051] [Cloudera][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:ParseException: Syntax error in line 1: create table if not exists my_temp_table as ^ Encountered: EOF Expected: SELECT, VALUES, W ...
@KDFinal: 看起来 Impala 不支持 if not exists
in create table
... 我更改了查询。
我在 Impala 中使用 if not exists
进行测试,它可以工作,我使用这个简单的 sql create table if not exists my_temp_table as select * from dirty_table;
,但是当 SELECT
子句中有子查询时,它就失败了。以上是关于在 Impala DB 中创建表作为选择百分比子查询的主要内容,如果未能解决你的问题,请参考以下文章