在插入语句中使用语句 HIVE EMR AWS

Posted

技术标签:

【中文标题】在插入语句中使用语句 HIVE EMR AWS【英文标题】:With statements inside an Insert statement HIVE EMR AWS 【发布时间】:2019-04-02 19:01:07 【问题描述】:

Hive 无法识别我在 INSERT 命令中的 WITH 语句。 我怎样才能让 hive 明白这一点?

我创建了外部配置单元表来存储此查询中引用的所有数据。一切正常,数据可用。这是将输出插入到 churn_date_out 表中的查询的实际内容。

为了将这个输出集放入表中,我使用了一个插入命令,然后通过 with 函数来构建输出数据。但是,一旦启动,Hive 就不喜欢 WITH 语句。

WITH 语句相互级联,最终输出从收益部分中选择。只要我们能弄清楚如何让 Hive 喜欢 WITH 语句,这些都不是真正相关的。

我真的很感激任何想法!谢谢。

 FAILED: ParseException line 2:0 cannot recognize input near 'WITH' 'customers' 'AS' in statement


INSERT OVERWRITE TABLE churn_data_out partition (month) (
WITH customers AS (
  SELECT c.cust_nm,
         aef.cust_key,
         min(date (SUBSTR(cast(cal.brdcsts_yr_mo_nbr AS VARCHAR),1,4) || '-' || SUBSTR(cast(cal.brdcst_yr_mo_nbr AS VARCHAR),5,6) || '-01')) AS first_payment,
         max(date (SUBSTR(cast(cal.brdcst_yr_mo_nbr AS VARCHAR),1,4) || '-' || SUBSTR(cast(cal.brdcst_yr_mo_nbr AS VARCHAR),5,6) || '-01')) AS last_payment
  FROM am_ad_event_fact_in as aef
INNER JOIN am_calendar_dim cal
    ON date_parse(aef.ad_evnt_start_dt, '%Y-%m-%d') = cal.clndr_dt
        AND cal.BRDCST_YR_NBR >= 2015
INNER JOIN am_eda_customer_dim c
    ON (aef.cust_key = c.cust_key)
  GROUP BY 1,2),

  months AS (
    SELECT month
FROM (SELECT sequence(date '2010-01-01', current_date, interval '1' month)
) AS x (i)
CROSS JOIN UNNEST(i) AS t (month)
),

athenasux AS (
  SELECT *
  FROM (customers as c
    INNER JOIN months as month
              ON (c.first_payment <= month))),


revenue AS (
    SELECT a.*,
    row_number() over (partition by a.cust_nm order by month) AS months_as_customer,
    max(case when aef.prio_cd >= 40 then 1 else 0 end) p40plus,
    sum(case when aef.spot_rate_nbr is null then 0 else aef.spot_rate_nbr end) as rev,
    count(aef.ad_evnt_key) as spots,
    count(distinct aef.ord_nbr) as num_orders,
    count(distinct syscode) num_syscodes
    FROM athenasux as a
    LEFT JOIN am_ad_event_fact_in as aef
    ON (aef.cust_key = a.cust_key
    AND date_parse(aef.ad_evnt_start_dt, '%Y-%m-%d') = a.month)
    GROUP BY a.cust_nm, a.cust_key, a.month, a.first_payment, a.last_payment
  )


SELECT *,
    sum(rev) over (partition by cust_key, cust_nm order by month
        rows between unbounded preceding and current row) rev_rt,
    sum(num_orders) over (partition by cust_key, cust_nm order by month
        rows between unbounded preceding and current row) num_orders_rt,
    sum(spots) over (partition by cust_key, cust_nm order by month
        rows between unbounded preceding and current row) spots_rt,
    sum(num_syscodes) over (partition by cust_key, cust_nm order by month
        rows between unbounded preceding and current row) num_syscodes_rt
FROM revenue
);

【问题讨论】:

【参考方案1】:

语法方面,insert 应该用在所有 cte 的末尾和最后一个 SELECT 的开头。

with cte1 as (...)
,cte2 as (...)
INSERT INTO ...
SELECT ....

【讨论】:

谢谢,通过将插入语句移动到最后一个选择语句,它得到了令人兴奋的新错误。我会及时通知你。

以上是关于在插入语句中使用语句 HIVE EMR AWS的主要内容,如果未能解决你的问题,请参考以下文章

hive向表格中插入数据并分析语句

Pyodbc 未检测 SQL 语句中的参数标记(即 - 插入表 SELECT ...)到 Hive 表。这个问题有解决方法吗?

Hive DML常见操作

AWS EMR 与 Glue 目录,明确指定 catalogId

hive语句中含有特殊符合$,导致使用hive -e "语句"时,引用不到变量

hive中怎么把文本插入表中