Postgres - 使用 CTE 的 id 列的唯一值,与 GROUP BY 一起加入

Posted

技术标签:

【中文标题】Postgres - 使用 CTE 的 id 列的唯一值,与 GROUP BY 一起加入【英文标题】:Postgres - Unique values for id column using CTE, Joins alongside GROUP BY 【发布时间】:2021-10-07 15:23:36 【问题描述】:

我有一张桌子referrals

id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
  3 |             2 | c         | t         | agent     |           3
  5 |             3 | e         | f         | customer  |           5
  4 |             1 | d         | t         | agent     |           4
  2 |             1 | b         | f         | agent     |           2
  1 |             1 | a         | t         | agent     |           1

还有一张桌子activations

    id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
  2 |             2 |           3 |           3.0 |            3 | a
  4 |             1 |           1 |           6.0 |            5 | b
  5 |             4 |           4 |           3.0 |            6 | c
  1 |             1 |           2 |           2.0 |            2 | b
  3 |             1 |           2 |           5.0 |            4 | b
  6 |             1 |           2 |           7.0 |            8 | a

我正在尝试从两个表中生成另一个表,该表只有 referrals.id 的唯一值,并将每个应用程序的计数作为列之一返回为 best_selling_app_count

这是我运行的查询:

with agents 
    as 
    (select 
    referrals.id, 
    referral_id, 
    amount_earned, 
    referred_at, 
    activated_at, 
    activations.app_id 
    from referrals 
    left outer join activations 
    on (referrals.id = activations.referral_id) 
    where referrals.user_id_owner = 1), 
    distinct_referrals_by_id 
    as 
    (select 
    id, 
    count(referral_id) as activations_count, 
    sum(coalesce(amount_earned, 0)) as amount_earned, 
    referred_at, 
    max(activated_at) as last_activated_at 
    from 
    agents 
    group by id, referred_at), 
    distinct_referrals_by_app_id 
    as 
    (select id, app_id as best_selling_app,
    count(app_id) as best_selling_app_count 
    from agents 
    group by id, app_id ) 
    select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank 
    from distinct_referrals_by_id 
    inner join distinct_referrals_by_app_id 
    on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);

这是我得到的结果:

id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
  2 |                 3 |          14.0 |           2 |                 8 |  2 | b                |                      2 |                     1
  1 |                 1 |           6.0 |           1 |                 5 |  1 | b                |                      1 |                     2
  2 |                 3 |          14.0 |           2 |                 8 |  2 | a                |                      1 |                     2
  4 |                 1 |           3.0 |           4 |                 6 |  4 | c                |                      1 |                     2

这个结果的问题是表有重复的id 2。我只需要id 列的唯一值。

我尝试了一种解决方法,即利用distinct 给出了预期的结果,但我担心查询结果可能不可靠和一致。 这是解决方法查询:

with agents 
    as 
    (select 
    referrals.id, 
    referral_id, 
    amount_earned, 
    referred_at, 
    activated_at, 
    activations.app_id 
    from referrals 
    left outer join activations 
    on (referrals.id = activations.referral_id) 
    where referrals.user_id_owner = 1), 
    distinct_referrals_by_id 
    as 
    (select 
    id, 
    count(referral_id) as activations_count, 
    sum(coalesce(amount_earned, 0)) as amount_earned, 
    referred_at, 
    max(activated_at) as last_activated_at 
    from 
    agents 
    group by id, referred_at), 
    distinct_referrals_by_app_id 
    as 
    (select 
    distinct on(id), app_id as best_selling_app,
    count(app_id) as best_selling_app_count 
    from agents 
    group by id, app_id 
    order by id, best_selling_app_count desc) 
    select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank 
    from distinct_referrals_by_id 
    inner join distinct_referrals_by_app_id 
    on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);

我需要关于如何最好地实现这一目标的建议。

【问题讨论】:

【参考方案1】:

我正在尝试从这两个表中生成另一个表,该表仅具有唯一的 Referrs.id 值,并将每个应用的计数作为 best_sales_app_count 作为列之一返回。

您的问题对于一个非常复杂的 SQL 查询来说真的很复杂。但是,以上看起来像是实际问题。如果是这样,您可以使用:

select r.*, 
       a.app_id as most_common_app_id,
       a.cnt as most_common_app_id_count
from referrals r left join
     (select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
      from activations a
      group by a.referral_id, a.app_id
      order by a.referral_id, count(*) desc
     ) a
     on a.referral_id = r.id;

您尚未解释结果集中的其他列。

【讨论】:

以上是关于Postgres - 使用 CTE 的 id 列的唯一值,与 GROUP BY 一起加入的主要内容,如果未能解决你的问题,请参考以下文章

postgres CTE 中的多个更新语句

基于组 ID 子集的时间戳列的组中的最后一行 - Postgres

具有递归 CTE 的 Postgres:在保留树结构的同时按受欢迎程度对子节点进行排序/排序(父节点始终高于子节点)

Postgres递归查询以在遍历parent_id时更新字段的值

PostgreSQL:具有选择性列的 row_to_json [重复]

Postgres:如何获取多个布尔列的 json 数组?