在插入语句中使用语句 HIVE EMR AWS
Posted
技术标签:
【中文标题】在插入语句中使用语句 HIVE EMR AWS【英文标题】:With statements inside an Insert statement HIVE EMR AWS 【发布时间】:2019-04-02 19:01:07 【问题描述】:Hive 无法识别我在 INSERT 命令中的 WITH 语句。 我怎样才能让 hive 明白这一点?
我创建了外部配置单元表来存储此查询中引用的所有数据。一切正常,数据可用。这是将输出插入到 churn_date_out 表中的查询的实际内容。
为了将这个输出集放入表中,我使用了一个插入命令,然后通过 with 函数来构建输出数据。但是,一旦启动,Hive 就不喜欢 WITH 语句。
WITH 语句相互级联,最终输出从收益部分中选择。只要我们能弄清楚如何让 Hive 喜欢 WITH 语句,这些都不是真正相关的。
我真的很感激任何想法!谢谢。
FAILED: ParseException line 2:0 cannot recognize input near 'WITH' 'customers' 'AS' in statement
INSERT OVERWRITE TABLE churn_data_out partition (month) (
WITH customers AS (
SELECT c.cust_nm,
aef.cust_key,
min(date (SUBSTR(cast(cal.brdcsts_yr_mo_nbr AS VARCHAR),1,4) || '-' || SUBSTR(cast(cal.brdcst_yr_mo_nbr AS VARCHAR),5,6) || '-01')) AS first_payment,
max(date (SUBSTR(cast(cal.brdcst_yr_mo_nbr AS VARCHAR),1,4) || '-' || SUBSTR(cast(cal.brdcst_yr_mo_nbr AS VARCHAR),5,6) || '-01')) AS last_payment
FROM am_ad_event_fact_in as aef
INNER JOIN am_calendar_dim cal
ON date_parse(aef.ad_evnt_start_dt, '%Y-%m-%d') = cal.clndr_dt
AND cal.BRDCST_YR_NBR >= 2015
INNER JOIN am_eda_customer_dim c
ON (aef.cust_key = c.cust_key)
GROUP BY 1,2),
months AS (
SELECT month
FROM (SELECT sequence(date '2010-01-01', current_date, interval '1' month)
) AS x (i)
CROSS JOIN UNNEST(i) AS t (month)
),
athenasux AS (
SELECT *
FROM (customers as c
INNER JOIN months as month
ON (c.first_payment <= month))),
revenue AS (
SELECT a.*,
row_number() over (partition by a.cust_nm order by month) AS months_as_customer,
max(case when aef.prio_cd >= 40 then 1 else 0 end) p40plus,
sum(case when aef.spot_rate_nbr is null then 0 else aef.spot_rate_nbr end) as rev,
count(aef.ad_evnt_key) as spots,
count(distinct aef.ord_nbr) as num_orders,
count(distinct syscode) num_syscodes
FROM athenasux as a
LEFT JOIN am_ad_event_fact_in as aef
ON (aef.cust_key = a.cust_key
AND date_parse(aef.ad_evnt_start_dt, '%Y-%m-%d') = a.month)
GROUP BY a.cust_nm, a.cust_key, a.month, a.first_payment, a.last_payment
)
SELECT *,
sum(rev) over (partition by cust_key, cust_nm order by month
rows between unbounded preceding and current row) rev_rt,
sum(num_orders) over (partition by cust_key, cust_nm order by month
rows between unbounded preceding and current row) num_orders_rt,
sum(spots) over (partition by cust_key, cust_nm order by month
rows between unbounded preceding and current row) spots_rt,
sum(num_syscodes) over (partition by cust_key, cust_nm order by month
rows between unbounded preceding and current row) num_syscodes_rt
FROM revenue
);
【问题讨论】:
【参考方案1】:语法方面,insert
应该用在所有 cte 的末尾和最后一个 SELECT
的开头。
with cte1 as (...)
,cte2 as (...)
INSERT INTO ...
SELECT ....
【讨论】:
谢谢,通过将插入语句移动到最后一个选择语句,它得到了令人兴奋的新错误。我会及时通知你。以上是关于在插入语句中使用语句 HIVE EMR AWS的主要内容,如果未能解决你的问题,请参考以下文章
Pyodbc 未检测 SQL 语句中的参数标记(即 - 插入表 SELECT ...)到 Hive 表。这个问题有解决方法吗?
AWS EMR 与 Glue 目录,明确指定 catalogId