Redshift - 超过百分位的第一个值
Posted
技术标签:
【中文标题】Redshift - 超过百分位的第一个值【英文标题】:Redshift - First value over a percentile 【发布时间】:2019-07-22 09:06:12 【问题描述】:这几天想弄明白,但我不知所措。
我有不同组的会话数,我想获得达到 80% 的组的 ID。当然,某些组不太可能以某种方式对齐,以使一个组恰好以 80% 结束。
我要查找的是我是否能够返回所有低于 80% 的行,然后只返回第一个 结束的行或等于
这里是示例数据:
+-----------+--------+--------------+----------------------+-------------+
| locale_id | orders | locale_total | locale_running_total | pc_of_total |
+-----------+--------+--------------+----------------------+-------------+
| 1 | 68 | 92 | 68 | 73.91304348 |
| 1 | 9 | 92 | 77 | 83.69565217 |
| 1 | 5 | 92 | 82 | 89.13043478 |
| 1 | 4 | 92 | 86 | 93.47826087 |
| 1 | 1 | 92 | 92 | 100 |
| 1 | 1 | 92 | 89 | 96.73913043 |
| 1 | 1 | 92 | 88 | 95.65217391 |
| 1 | 1 | 92 | 91 | 98.91304348 |
| 1 | 1 | 92 | 90 | 97.82608696 |
| 1 | 1 | 92 | 87 | 94.56521739 |
| 2 | 130 | 188 | 130 | 69.14893617 |
| 2 | 18 | 188 | 148 | 78.72340426 |
| 2 | 9 | 188 | 157 | 83.5106383 |
| 2 | 9 | 188 | 166 | 88.29787234 |
| 2 | 5 | 188 | 171 | 90.95744681 |
| 2 | 4 | 188 | 175 | 93.08510638 |
| 2 | 3 | 188 | 178 | 94.68085106 |
| 2 | 3 | 188 | 181 | 96.27659574 |
| 2 | 2 | 188 | 183 | 97.34042553 |
| 2 | 2 | 188 | 185 | 98.40425532 |
| 2 | 1 | 188 | 188 | 100 |
| 2 | 1 | 188 | 186 | 98.93617021 |
| 2 | 1 | 188 | 187 | 99.46808511 |
| 3 | 3878 | 6489 | 3878 | 59.7626753 |
| 3 | 1823 | 6489 | 5701 | 87.85637232 |
| 3 | 206 | 6489 | 5907 | 91.0309755 |
| 3 | 131 | 6489 | 6038 | 93.04977654 |
| 3 | 82 | 6489 | 6120 | 94.31345354 |
| 3 | 69 | 6489 | 6189 | 95.37679149 |
| 3 | 69 | 6489 | 6258 | 96.44012945 |
| 3 | 50 | 6489 | 6308 | 97.2106642 |
| 3 | 34 | 6489 | 6342 | 97.73462783 |
| 3 | 26 | 6489 | 6368 | 98.1353059 |
| 3 | 21 | 6489 | 6389 | 98.4589305 |
| 3 | 18 | 6489 | 6407 | 98.73632301 |
| 3 | 17 | 6489 | 6424 | 98.99830482 |
| 3 | 10 | 6489 | 6434 | 99.15241177 |
| 3 | 9 | 6489 | 6452 | 99.42980428 |
| 3 | 9 | 6489 | 6443 | 99.29110803 |
| 3 | 8 | 6489 | 6460 | 99.55308984 |
| 3 | 6 | 6489 | 6472 | 99.73801818 |
| 3 | 6 | 6489 | 6466 | 99.64555401 |
| 3 | 5 | 6489 | 6477 | 99.81507166 |
| 3 | 4 | 6489 | 6481 | 99.87671444 |
| 3 | 4 | 6489 | 6485 | 99.93835722 |
| 3 | 3 | 6489 | 6488 | 99.9845893 |
| 3 | 1 | 6489 | 6489 | 100 |
| 4 | 779 | 1636 | 779 | 47.61613692 |
| 4 | 257 | 1636 | 1036 | 63.32518337 |
| 4 | 102 | 1636 | 1138 | 69.5599022 |
| 4 | 97 | 1636 | 1235 | 75.48899756 |
| 4 | 89 | 1636 | 1324 | 80.92909535 |
| 4 | 72 | 1636 | 1396 | 85.33007335 |
| 4 | 47 | 1636 | 1443 | 88.20293399 |
| 4 | 31 | 1636 | 1474 | 90.09779951 |
| 4 | 26 | 1636 | 1500 | 91.68704156 |
| 4 | 23 | 1636 | 1523 | 93.09290954 |
| 4 | 21 | 1636 | 1544 | 94.37652812 |
| 4 | 17 | 1636 | 1561 | 95.41564792 |
| 4 | 12 | 1636 | 1573 | 96.14914425 |
| 4 | 9 | 1636 | 1582 | 96.6992665 |
| 4 | 8 | 1636 | 1590 | 97.18826406 |
| 4 | 8 | 1636 | 1598 | 97.67726161 |
| 4 | 6 | 1636 | 1604 | 98.04400978 |
| 4 | 6 | 1636 | 1610 | 98.41075795 |
| 4 | 5 | 1636 | 1615 | 98.71638142 |
| 4 | 4 | 1636 | 1623 | 99.20537897 |
| 4 | 4 | 1636 | 1619 | 98.9608802 |
| 4 | 3 | 1636 | 1629 | 99.57212714 |
| 4 | 3 | 1636 | 1626 | 99.38875306 |
| 4 | 2 | 1636 | 1631 | 99.69437653 |
| 4 | 1 | 1636 | 1632 | 99.75550122 |
| 4 | 1 | 1636 | 1634 | 99.87775061 |
| 4 | 1 | 1636 | 1633 | 99.81662592 |
| 4 | 1 | 1636 | 1636 | 100 |
| 4 | 1 | 1636 | 1635 | 99.93887531 |
| 5 | 130 | 215 | 130 | 60.46511628 |
| 5 | 37 | 215 | 167 | 77.6744186 |
| 5 | 14 | 215 | 181 | 84.18604651 |
| 5 | 11 | 215 | 192 | 89.30232558 |
| 5 | 5 | 215 | 197 | 91.62790698 |
| 5 | 4 | 215 | 201 | 93.48837209 |
| 5 | 4 | 215 | 205 | 95.34883721 |
| 5 | 3 | 215 | 208 | 96.74418605 |
| 5 | 2 | 215 | 210 | 97.6744186 |
| 5 | 2 | 215 | 212 | 98.60465116 |
| 5 | 1 | 215 | 215 | 100 |
| 5 | 1 | 215 | 213 | 99.06976744 |
| 5 | 1 | 215 | 214 | 99.53488372 |
| 6 | 242 | 682 | 242 | 35.48387097 |
| 6 | 180 | 682 | 422 | 61.87683284 |
| 6 | 132 | 682 | 554 | 81.23167155 |
| 6 | 58 | 682 | 612 | 89.73607038 |
| 6 | 21 | 682 | 633 | 92.81524927 |
| 6 | 14 | 682 | 647 | 94.86803519 |
| 6 | 12 | 682 | 659 | 96.62756598 |
| 6 | 10 | 682 | 669 | 98.09384164 |
| 6 | 5 | 682 | 674 | 98.82697947 |
| 6 | 2 | 682 | 676 | 99.1202346 |
| 6 | 2 | 682 | 678 | 99.41348974 |
| 6 | 1 | 682 | 679 | 99.5601173 |
| 6 | 1 | 682 | 680 | 99.70674487 |
| 6 | 1 | 682 | 682 | 100 |
| 6 | 1 | 682 | 681 | 99.85337243 |
| 7 | 200 | 456 | 200 | 43.85964912 |
| 7 | 168 | 456 | 368 | 80.70175439 |
| 7 | 30 | 456 | 398 | 87.28070175 |
| 7 | 17 | 456 | 415 | 91.00877193 |
| 7 | 9 | 456 | 424 | 92.98245614 |
| 7 | 5 | 456 | 429 | 94.07894737 |
| 7 | 4 | 456 | 433 | 94.95614035 |
| 7 | 4 | 456 | 441 | 96.71052632 |
| 7 | 4 | 456 | 437 | 95.83333333 |
| 7 | 3 | 456 | 444 | 97.36842105 |
| 7 | 3 | 456 | 453 | 99.34210526 |
| 7 | 3 | 456 | 450 | 98.68421053 |
| 7 | 3 | 456 | 447 | 98.02631579 |
| 7 | 2 | 456 | 455 | 99.78070175 |
| 7 | 1 | 456 | 456 | 100 |
+-----------+--------+--------------+----------------------+-------------+
给我上面结果的查询在这里...我尝试了 CTE,四舍五入 pc_of_total,但似乎没有什么能做我想做的事..
SELECT
*,
-- Calculate the total of all orders within a locale
SUM(orders) OVER (PARTITION BY locale_id) AS "locale_total",
-- Create running sum based on orders
SUM(orders) OVER (PARTITION BY locale_id ORDER BY orders DESC ROWS UNBOUNDED PRECEDING) AS "locale_running_total",
-- Calculating percentile, running pc of total
(locale_running_total / locale_total::NUMERIC) * 100 AS "pc_of_total"
FROM (
SELECT
locale_id,
SUM(orders) AS "orders"
FROM table
GROUP BY
locale_id
) d
我想要的输出是
+-----------+--------+--------------+----------------------+-------------+
| locale_id | orders | locale_total | locale_running_total | pc_of_total |
+-----------+--------+--------------+----------------------+-------------+
| 1 | 68 | 92 | 68 | 73.91304348 |
| 1 | 9 | 92 | 77 | 83.69565217 |
| 2 | 130 | 188 | 130 | 69.14893617 |
| 2 | 18 | 188 | 148 | 78.72340426 |
| 2 | 9 | 188 | 157 | 83.5106383 |
| 3 | 3878 | 6489 | 3878 | 59.7626753 |
| 3 | 1823 | 6489 | 5701 | 87.85637232 |
| 4 | 779 | 1636 | 779 | 47.61613692 |
| 4 | 257 | 1636 | 1036 | 63.32518337 |
| 4 | 102 | 1636 | 1138 | 69.5599022 |
| 4 | 97 | 1636 | 1235 | 75.48899756 |
| 4 | 89 | 1636 | 1324 | 80.92909535 |
| 5 | 130 | 215 | 130 | 60.46511628 |
| 5 | 37 | 215 | 167 | 77.6744186 |
| 5 | 14 | 215 | 181 | 84.18604651 |
| 6 | 242 | 682 | 242 | 35.48387097 |
| 6 | 180 | 682 | 422 | 61.87683284 |
| 6 | 132 | 682 | 554 | 81.23167155 |
| 7 | 200 | 456 | 200 | 43.85964912 |
| 7 | 168 | 456 | 368 | 80.70175439 |
+-----------+--------+--------------+----------------------+-------------+
运行 Redshift 1.0.8727
【问题讨论】:
【参考方案1】:首先,您可以将窗口函数与聚合混合,这在一定程度上简化了查询。
然后,你可以通过一个简单的比较来得到你想要的:
SELECT d.*
FROM (SELECT locale_id, SUM(orders) AS orders,
SUM(SUM(orders)) OVER (PARTITION BY locale_id) AS locale_total,
SUM(SUM(orders)) OVER (PARTITION BY locale_id
ORDER BY SUM(orders) DESC
ROWS UNBOUNDED PRECEDING
) as locale_running_total
FROM table
GROUP BY locale_id
) d
WHERE (locale_running_total - orders) < 0.8 * locale_total;
请注意,这会从运行总数中减去 orders
以进行比较。这样,它会得到第一个超过 80% 的值。
【讨论】:
以上是关于Redshift - 超过百分位的第一个值的主要内容,如果未能解决你的问题,请参考以下文章
按百分位数将类似 sql 的查询的结果分组:在 Redshift / postgresql