org.apache.spark.sql.AnalysisException:表达式 't2.`sum_click_passed`' 既不在 group by 中,也不是聚合函数

Posted

技术标签:

【中文标题】org.apache.spark.sql.AnalysisException:表达式 \'t2.`sum_click_passed`\' 既不在 group by 中,也不是聚合函数【英文标题】:org.apache.spark.sql.AnalysisException: expression 't2.`sum_click_passed`' is neither present in the group by, nor is it an aggregate functionorg.apache.spark.sql.AnalysisException:表达式 't2.`sum_click_passed`' 既不在 group by 中,也不是聚合函数 【发布时间】:2021-09-18 08:43:09 【问题描述】:

例如:

SELECT
    bucket,
    repeat_all_click,
    sum_click_passed,
    sum_imp_passed,
    sum_charge,
    sum_click_passed as acp,
    sum_roi_cnt / sum_click_passed as shop_cvr,
    sum_roi_amt / sum_charge as shop_roi,
    sum_roi_pay_cnt / sum_click_passed as pay_cvr,
    sum_roi_pay_amt / sum_charge as pay_roi,
    1.0 * sum_cpv_all / sum_spv_all as imp_rate
FROM
    (
        (
            SELECT
                request_id,
                IF (
                    all_click_cnt = 1,
                    '= 1',
                    IF (
                        all_click_cnt > 1,
                        '> 1',
                        '= 0'
                    )
                ) as repeat_all_click
            FROM
                a
            WHERE
                partition_date BETWEEN '2021-09-08'
                AND '2021-09-17'
                AND channel = 'HS'
                AND slot_id = 5
                AND (
                    rerank_algo = 'algo1'
                    OR rerank_algo = 'algo2'
                )
            GROUP BY
                1,
                2
        ) t1
        JOIN (
            SELECT
                request_id,
                sum(click_passed) as sum_click_passed,
                sum(imp_passed) as sum_imp_passed,
                sum(charge) as sum_charge,
                sum(roi_cnt) as sum_roi_cnt,
                sum(roi_amt) as sum_roi_amt,
                sum(roi_pay_cnt) as sum_roi_pay_cnt,
                sum(roi_pay_amt) as sum_roi_pay_amt,
                sum(cpv_all) as sum_cpv_all,
                sum(spv_all) as sum_spv_all
            FROM
                b
            WHERE
                partition_date BETWEEN '2021-09-14'
                AND '2021-09-17'
                AND slotid = 5
            GROUP BY
                request_id
        ) t2 ON t1.request_id = t2.request_id
        JOIN (
            SELECT
                requestid AS request_id,
                IF (strategy_path LIKE '%4-54-2612%', 'EXP', 'BASE') AS bucket
            FROM
                c
            WHERE
                (
                    dt BETWEEN '20210914'
                    AND '20210917'
                    AND channel = 'S'
                    AND (
                        slot_ids LIKE '%50011%'
                        OR slot_ids LIKE '%50020%'
                    )
                    AND (
                        strategy_path LIKE '%54-4%'
                        OR strategy_path LIKE '%54-2%'
                    )
                )
            GROUP BY
                1,
                2
        ) t3 ON t2.request_id = t3.request_id
    )
GROUP BY
    1,
    2
ORDER BY
    1,
    2

用户类抛出异常:org.apache.spark.sql.AnalysisException:表达式't2.sum_click_passed'既不在group by中,也不是聚合函数。如果您不在乎获得哪个值,请添加到 group by 或包裹在 first() (或 first_value)中。;;排序 [bucket#152 ASC NULLS FIRST, repeat_all_click#141 ASC NULLS FIRST], true +- 聚合 [bucket#152, repeat_all_click#141], [bucket#152, repeat_all_click#141,

我对 Hiveql 不是很熟悉,但在 SQL 中应该不会出错。 不幸的是,它像以前一样出错,我不知道如何正确修复它,因为我认为sum(click_passed) as sum_click_passed 应该是一个聚合函数。 .

谁能帮帮我? 提前致谢。

【问题讨论】:

【参考方案1】:

我认为 hql 语句的语法是错误的。在 first from 和 last group by 子句之后删除那些额外的括号。 语法应该是

SELECT ..
FROM 
          (SELECT... FROM T1)T1
JOIN (SELECT... )T2 ON ...
JOIN (SELECT... )T3 ON ...
GROUP BY...
ORDER BY...

请在下面使用。

SELECT
    bucket,
    repeat_all_click,
    sum_click_passed,
    sum_imp_passed,
    sum_charge,
    sum_click_passed as acp,
    sum_roi_cnt / sum_click_passed as shop_cvr,
    sum_roi_amt / sum_charge as shop_roi,
    sum_roi_pay_cnt / sum_click_passed as pay_cvr,
    sum_roi_pay_amt / sum_charge as pay_roi,
    1.0 * sum_cpv_all / sum_spv_all as imp_rate
FROM
    --( removed/commented out
        (
            SELECT
                request_id,
                IF (
                    all_click_cnt = 1,
                    '= 1',
                    IF (
                        all_click_cnt > 1,
                        '> 1',
                        '= 0'
                    )
                ) as repeat_all_click
            FROM
                a
            WHERE
                partition_date BETWEEN '2021-09-08'
                AND '2021-09-17'
                AND channel = 'HS'
                AND slot_id = 5
                AND (
                    rerank_algo = 'algo1'
                    OR rerank_algo = 'algo2'
                )
            GROUP BY
                1,
                2
        ) t1
        JOIN (
            SELECT
                request_id,
                sum(click_passed) as sum_click_passed,
                sum(imp_passed) as sum_imp_passed,
                sum(charge) as sum_charge,
                sum(roi_cnt) as sum_roi_cnt,
                sum(roi_amt) as sum_roi_amt,
                sum(roi_pay_cnt) as sum_roi_pay_cnt,
                sum(roi_pay_amt) as sum_roi_pay_amt,
                sum(cpv_all) as sum_cpv_all,
                sum(spv_all) as sum_spv_all
            FROM
                b
            WHERE
                partition_date BETWEEN '2021-09-14'
                AND '2021-09-17'
                AND slotid = 5
            GROUP BY
                request_id
        ) t2 ON t1.request_id = t2.request_id
        JOIN (
            SELECT
                requestid AS request_id,
                IF (strategy_path LIKE '%4-54-2612%', 'EXP', 'BASE') AS bucket
            FROM
                c
            WHERE
                (
                    dt BETWEEN '20210914'
                    AND '20210917'
                    AND channel = 'S'
                    AND (
                        slot_ids LIKE '%50011%'
                        OR slot_ids LIKE '%50020%'
                    )
                    AND (
                        strategy_path LIKE '%54-4%'
                        OR strategy_path LIKE '%54-2%'
                    )
                )
            GROUP BY
                1,
                2
        ) t3 ON t2.request_id = t3.request_id
   --) removed
GROUP BY
    1,
    2
ORDER BY
    1,
    2

【讨论】:

以上是关于org.apache.spark.sql.AnalysisException:表达式 't2.`sum_click_passed`' 既不在 group by 中,也不是聚合函数的主要内容,如果未能解决你的问题,请参考以下文章