Redshift - 超过百分位的第一个值

Posted

技术标签:

【中文标题】Redshift - 超过百分位的第一个值【英文标题】:Redshift - First value over a percentile 【发布时间】:2019-07-22 09:06:12 【问题描述】:

这几天想弄明白,但我不知所措。

我有不同组的会话数,我想获得达到 80% 的组的 ID。当然,某些组不太可能以某种方式对齐,以使一个组恰好以 80% 结束。

我要查找的是我是否能够返回所有低于 80% 的行,然后只返回第一个 结束的行或等于

这里是示例数据:

+-----------+--------+--------------+----------------------+-------------+
| locale_id | orders | locale_total | locale_running_total | pc_of_total |
+-----------+--------+--------------+----------------------+-------------+
|         1 |     68 |           92 |                   68 | 73.91304348 |
|         1 |      9 |           92 |                   77 | 83.69565217 |
|         1 |      5 |           92 |                   82 | 89.13043478 |
|         1 |      4 |           92 |                   86 | 93.47826087 |
|         1 |      1 |           92 |                   92 | 100         |
|         1 |      1 |           92 |                   89 | 96.73913043 |
|         1 |      1 |           92 |                   88 | 95.65217391 |
|         1 |      1 |           92 |                   91 | 98.91304348 |
|         1 |      1 |           92 |                   90 | 97.82608696 |
|         1 |      1 |           92 |                   87 | 94.56521739 |
|         2 |    130 |          188 |                  130 | 69.14893617 |
|         2 |     18 |          188 |                  148 | 78.72340426 |
|         2 |      9 |          188 |                  157 | 83.5106383  |
|         2 |      9 |          188 |                  166 | 88.29787234 |
|         2 |      5 |          188 |                  171 | 90.95744681 |
|         2 |      4 |          188 |                  175 | 93.08510638 |
|         2 |      3 |          188 |                  178 | 94.68085106 |
|         2 |      3 |          188 |                  181 | 96.27659574 |
|         2 |      2 |          188 |                  183 | 97.34042553 |
|         2 |      2 |          188 |                  185 | 98.40425532 |
|         2 |      1 |          188 |                  188 | 100         |
|         2 |      1 |          188 |                  186 | 98.93617021 |
|         2 |      1 |          188 |                  187 | 99.46808511 |
|         3 |   3878 |         6489 |                 3878 | 59.7626753  |
|         3 |   1823 |         6489 |                 5701 | 87.85637232 |
|         3 |    206 |         6489 |                 5907 | 91.0309755  |
|         3 |    131 |         6489 |                 6038 | 93.04977654 |
|         3 |     82 |         6489 |                 6120 | 94.31345354 |
|         3 |     69 |         6489 |                 6189 | 95.37679149 |
|         3 |     69 |         6489 |                 6258 | 96.44012945 |
|         3 |     50 |         6489 |                 6308 | 97.2106642  |
|         3 |     34 |         6489 |                 6342 | 97.73462783 |
|         3 |     26 |         6489 |                 6368 | 98.1353059  |
|         3 |     21 |         6489 |                 6389 | 98.4589305  |
|         3 |     18 |         6489 |                 6407 | 98.73632301 |
|         3 |     17 |         6489 |                 6424 | 98.99830482 |
|         3 |     10 |         6489 |                 6434 | 99.15241177 |
|         3 |      9 |         6489 |                 6452 | 99.42980428 |
|         3 |      9 |         6489 |                 6443 | 99.29110803 |
|         3 |      8 |         6489 |                 6460 | 99.55308984 |
|         3 |      6 |         6489 |                 6472 | 99.73801818 |
|         3 |      6 |         6489 |                 6466 | 99.64555401 |
|         3 |      5 |         6489 |                 6477 | 99.81507166 |
|         3 |      4 |         6489 |                 6481 | 99.87671444 |
|         3 |      4 |         6489 |                 6485 | 99.93835722 |
|         3 |      3 |         6489 |                 6488 | 99.9845893  |
|         3 |      1 |         6489 |                 6489 | 100         |
|         4 |    779 |         1636 |                  779 | 47.61613692 |
|         4 |    257 |         1636 |                 1036 | 63.32518337 |
|         4 |    102 |         1636 |                 1138 | 69.5599022  |
|         4 |     97 |         1636 |                 1235 | 75.48899756 |
|         4 |     89 |         1636 |                 1324 | 80.92909535 |
|         4 |     72 |         1636 |                 1396 | 85.33007335 |
|         4 |     47 |         1636 |                 1443 | 88.20293399 |
|         4 |     31 |         1636 |                 1474 | 90.09779951 |
|         4 |     26 |         1636 |                 1500 | 91.68704156 |
|         4 |     23 |         1636 |                 1523 | 93.09290954 |
|         4 |     21 |         1636 |                 1544 | 94.37652812 |
|         4 |     17 |         1636 |                 1561 | 95.41564792 |
|         4 |     12 |         1636 |                 1573 | 96.14914425 |
|         4 |      9 |         1636 |                 1582 | 96.6992665  |
|         4 |      8 |         1636 |                 1590 | 97.18826406 |
|         4 |      8 |         1636 |                 1598 | 97.67726161 |
|         4 |      6 |         1636 |                 1604 | 98.04400978 |
|         4 |      6 |         1636 |                 1610 | 98.41075795 |
|         4 |      5 |         1636 |                 1615 | 98.71638142 |
|         4 |      4 |         1636 |                 1623 | 99.20537897 |
|         4 |      4 |         1636 |                 1619 | 98.9608802  |
|         4 |      3 |         1636 |                 1629 | 99.57212714 |
|         4 |      3 |         1636 |                 1626 | 99.38875306 |
|         4 |      2 |         1636 |                 1631 | 99.69437653 |
|         4 |      1 |         1636 |                 1632 | 99.75550122 |
|         4 |      1 |         1636 |                 1634 | 99.87775061 |
|         4 |      1 |         1636 |                 1633 | 99.81662592 |
|         4 |      1 |         1636 |                 1636 | 100         |
|         4 |      1 |         1636 |                 1635 | 99.93887531 |
|         5 |    130 |          215 |                  130 | 60.46511628 |
|         5 |     37 |          215 |                  167 | 77.6744186  |
|         5 |     14 |          215 |                  181 | 84.18604651 |
|         5 |     11 |          215 |                  192 | 89.30232558 |
|         5 |      5 |          215 |                  197 | 91.62790698 |
|         5 |      4 |          215 |                  201 | 93.48837209 |
|         5 |      4 |          215 |                  205 | 95.34883721 |
|         5 |      3 |          215 |                  208 | 96.74418605 |
|         5 |      2 |          215 |                  210 | 97.6744186  |
|         5 |      2 |          215 |                  212 | 98.60465116 |
|         5 |      1 |          215 |                  215 | 100         |
|         5 |      1 |          215 |                  213 | 99.06976744 |
|         5 |      1 |          215 |                  214 | 99.53488372 |
|         6 |    242 |          682 |                  242 | 35.48387097 |
|         6 |    180 |          682 |                  422 | 61.87683284 |
|         6 |    132 |          682 |                  554 | 81.23167155 |
|         6 |     58 |          682 |                  612 | 89.73607038 |
|         6 |     21 |          682 |                  633 | 92.81524927 |
|         6 |     14 |          682 |                  647 | 94.86803519 |
|         6 |     12 |          682 |                  659 | 96.62756598 |
|         6 |     10 |          682 |                  669 | 98.09384164 |
|         6 |      5 |          682 |                  674 | 98.82697947 |
|         6 |      2 |          682 |                  676 | 99.1202346  |
|         6 |      2 |          682 |                  678 | 99.41348974 |
|         6 |      1 |          682 |                  679 | 99.5601173  |
|         6 |      1 |          682 |                  680 | 99.70674487 |
|         6 |      1 |          682 |                  682 | 100         |
|         6 |      1 |          682 |                  681 | 99.85337243 |
|         7 |    200 |          456 |                  200 | 43.85964912 |
|         7 |    168 |          456 |                  368 | 80.70175439 |
|         7 |     30 |          456 |                  398 | 87.28070175 |
|         7 |     17 |          456 |                  415 | 91.00877193 |
|         7 |      9 |          456 |                  424 | 92.98245614 |
|         7 |      5 |          456 |                  429 | 94.07894737 |
|         7 |      4 |          456 |                  433 | 94.95614035 |
|         7 |      4 |          456 |                  441 | 96.71052632 |
|         7 |      4 |          456 |                  437 | 95.83333333 |
|         7 |      3 |          456 |                  444 | 97.36842105 |
|         7 |      3 |          456 |                  453 | 99.34210526 |
|         7 |      3 |          456 |                  450 | 98.68421053 |
|         7 |      3 |          456 |                  447 | 98.02631579 |
|         7 |      2 |          456 |                  455 | 99.78070175 |
|         7 |      1 |          456 |                  456 | 100         |
+-----------+--------+--------------+----------------------+-------------+

给我上面结果的查询在这里...我尝试了 CTE,四舍五入 pc_of_total,但似乎没有什么能做我想做的事..

SELECT
    *,
    -- Calculate the total of all orders within a locale
    SUM(orders) OVER (PARTITION BY locale_id) AS "locale_total",

    -- Create running sum based on orders
    SUM(orders) OVER (PARTITION BY locale_id ORDER BY orders DESC ROWS UNBOUNDED PRECEDING) AS "locale_running_total",

    -- Calculating percentile, running pc of total
    (locale_running_total / locale_total::NUMERIC) * 100 AS "pc_of_total"
FROM (
    SELECT
        locale_id,
        SUM(orders) AS "orders"
    FROM table
    GROUP BY
        locale_id
) d

我想要的输出是

+-----------+--------+--------------+----------------------+-------------+
| locale_id | orders | locale_total | locale_running_total | pc_of_total |
+-----------+--------+--------------+----------------------+-------------+
|         1 |     68 |           92 |                   68 | 73.91304348 |
|         1 |      9 |           92 |                   77 | 83.69565217 |
|         2 |    130 |          188 |                  130 | 69.14893617 |
|         2 |     18 |          188 |                  148 | 78.72340426 |
|         2 |      9 |          188 |                  157 | 83.5106383  |
|         3 |   3878 |         6489 |                 3878 | 59.7626753  |
|         3 |   1823 |         6489 |                 5701 | 87.85637232 |
|         4 |    779 |         1636 |                  779 | 47.61613692 |
|         4 |    257 |         1636 |                 1036 | 63.32518337 |
|         4 |    102 |         1636 |                 1138 | 69.5599022  |
|         4 |     97 |         1636 |                 1235 | 75.48899756 |
|         4 |     89 |         1636 |                 1324 | 80.92909535 |
|         5 |    130 |          215 |                  130 | 60.46511628 |
|         5 |     37 |          215 |                  167 | 77.6744186  |
|         5 |     14 |          215 |                  181 | 84.18604651 |
|         6 |    242 |          682 |                  242 | 35.48387097 |
|         6 |    180 |          682 |                  422 | 61.87683284 |
|         6 |    132 |          682 |                  554 | 81.23167155 |
|         7 |    200 |          456 |                  200 | 43.85964912 |
|         7 |    168 |          456 |                  368 | 80.70175439 |
+-----------+--------+--------------+----------------------+-------------+

运行 Redshift 1.0.8727

【问题讨论】:

【参考方案1】:

首先,您可以将窗口函数与聚合混合,这在一定程度上简化了查询。

然后,你可以通过一个简单的比较来得到你想要的:

SELECT d.*
FROM (SELECT locale_id, SUM(orders) AS orders,
             SUM(SUM(orders)) OVER (PARTITION BY locale_id) AS locale_total,
             SUM(SUM(orders)) OVER (PARTITION BY locale_id
                                    ORDER BY SUM(orders) DESC
                                    ROWS UNBOUNDED PRECEDING
                                   ) as locale_running_total
      FROM table
      GROUP BY locale_id
     ) d
WHERE (locale_running_total - orders) < 0.8 * locale_total;

请注意,这会从运行总数中减去 orders 以进行比较。这样,它会得到第一个超过 80% 的值。

【讨论】:

以上是关于Redshift - 超过百分位的第一个值的主要内容,如果未能解决你的问题,请参考以下文章

使用 MySQL 计算百分比值

按百分位数将类似 sql 的查询的结果分组:在 Redshift / postgresql

如何创建基于百分位的指标图表?

Redshift:在没有任何会话ID的情况下查找会话中的第一个和最后一个事件

用 Redshift 中的第一个非空跟随值填充缺失值

ggplot中基于​​百分位的颜色代码点