如何在时间间隔内查找购买次数 SQL
Posted
技术标签:
【中文标题】如何在时间间隔内查找购买次数 SQL【英文标题】:How to find the number of purchases over time intervals SQL 【发布时间】:2015-02-15 20:40:18 【问题描述】:我正在使用 Redshift (Postgres) 和 Pandas 来完成我的工作。我正在尝试获取用户操作的数量,让我们说购买以使其更容易理解。我有一张表,购买包含以下数据:
user_id, timestamp , price
1, , 2015-02-01, 200
1, , 2015-02-02, 50
1, , 2015-02-10, 75
最后,我想要某个时间戳的购买次数。比如
userid, 28-14_days, 14-7_days, 7
这是我目前所拥有的,我知道我没有日期上限:
SELECT DISTINCT x_days.user_id, SUM(x_days.purchases) AS x_num, SUM(y_days.purchases) AS y_num,
x_days.x_date, y_days.y_date
FROM
(
SELECT purchases.user_id, COUNT(purchases.user_id) as purchases,
DATE(purchases.timestamp) as x_date
FROM purchases
WHERE purchases.timestamp > (current_date - INTERVAL '%(x_days_ago)s day') AND
purchases.max_value > 200
GROUP BY DATE(purchases.timestamp), purchases.user_id
) AS x_days
JOIN
(
SELECT purchases.user_id, COUNT(purchases.user_id) as purchases,
DATE(purchases.timestamp) as y_date
FROM purchases
WHERE purchases.timestamp > (current_date - INTERVAL '%(y_days_ago)s day') AND
purchases.max_value > 200
GROUP BY DATE(purchases.timestamp), purchases.user_id) AS y_days
ON
x_days.user_id = y_days.user_id
GROUP BY
x_days.user_id, x_days.x_date, y_days.y_date
params='x_days_ago':x_days_ago, 'y_days_ago':y_days_ago
where these are set in python/pandas
x_days_ago = 14 y_days_ago = 7
但这并没有完全按计划进行:
user_id x_num y_num x_date y_date
0 5451772 1 1 2015-02-10 2015-02-10
1 5026678 1 1 2015-02-09 2015-02-09
2 6337993 2 1 2015-02-14 2015-02-13
3 6204432 1 3 2015-02-10 2015-02-11
4 3417539 1 1 2015-02-11 2015-02-11
即使我没有可以查看的上限日期(因此 x 有效地搜索从 14 天到现在,y 是 7 天到现在,这意味着重叠),在某些情况下 y 更高。
谁能帮我解决这个问题或给我一个更好的方法?
谢谢!
【问题讨论】:
【参考方案1】:这可能不是最有效的答案,但您可以使用子选择生成每个总和:
WITH
summed AS (
SELECT user_id, day, COUNT(1) AS purchases
FROM (SELECT user_id, DATE(timestamp) AS day FROM purchases) AS _
GROUP BY user_id, day
),
users AS (SELECT DISTINCT user_id FROM purchases)
SELECT user_id,
(SELECT SUM(purchases) FROM summed
WHERE summed.user_id = users.user_id
AND day >= DATE(NOW() - interval ' 7 days')) AS days_7,
(SELECT SUM(purchases) FROM summed
WHERE summed.user_id = users.user_id
AND day >= DATE(NOW() - interval '14 days')) AS days_14
FROM users;
(这是在 Postgres 中测试的,而不是在 Redshift 中测试的;但 Redshift 文档表明 WITH
和 DISTINCT
都受支持。)我希望使用窗口来执行此操作,以获得滚动总和;但是没有generate_series()
会有点麻烦。
【讨论】:
感谢@solidsnack,但根据docs.aws.amazon.com/redshift/latest/dg/…redshift 不支持 WITH 它说WITH
不支持与INSERT
、DELETE
和UPDATE
一起使用。请参阅 SELECT
下的 WITH
文档:docs.aws.amazon.com/redshift/latest/dg/r_WITH_clause.html “WITH
子句是查询中 SELECT
列表之前的可选子句。”以上是关于如何在时间间隔内查找购买次数 SQL的主要内容,如果未能解决你的问题,请参考以下文章
sql server中如何查找一天内固定时间段内某个字段对应的记录条数?