如何在python中查找给定日期每周的总播放时间?

Posted

技术标签:

【中文标题】如何在python中查找给定日期每周的总播放时间?【英文标题】:how to find total play time of each week for the given date in python? 【发布时间】:2021-07-17 07:22:18 【问题描述】:

我有一个如下所示的数据框

k='user_id':[1,1,1,1,1,2,2,2,3,3,3,3,3,4,4,4,5,5],
   'created':[ '2/09/2021','2/10/2021','2/16/2021','2/17/2021','3/09/2021','3/10/2021','3/18/2021','3/19/2021',
              '2/19/2021','2/20/2021','2/26/2021','2/27/2021','3/09/2021','2/10/2021','2/18/2021','3/19/2021',
             '3/24/2021','3/30/2021',],
   'stop_time':[11,12,13,14,15,25,26,27,6,7,8,9,10,11,12,13,25,26],
  'play_time':[10,11,12,13,14,24,25,26,5,6,7,8,9,10,11,13,24,25]

df=pd.DataFrame(data=k)

df['created']=pd.to_datetime(df['created'], format='%m/%d/%Y')
df['total_play_time'] = df['stop_time'] - df['play_time']

现在我们需要使用每个 user_id 的第一个日期作为第一周的开始日期,例如我们需要选择 '2/9/2021' 是 user_id 1 的第一周开始日期和 '3/09 /2021' 作为 user_id 2 的第一周开始日期。

我们需要将 user_id 每周的总游戏时间相加,它继续给出每个时间的总和,直到当前日期(例如,如果运行报告到今天,它必须给出每周的总和,直到今天)并给出如下结果

ID  week1   week2     week3  week4  week5  week6 week7  week8     week9  week10  week11  week12
1   3        2        0      0      0      0     0      0         0       0       0      0
2   1        2        0      0      0      0     0

【问题讨论】:

请通过intro tour、help center 和how to ask a good question 了解本网站的工作原理并帮助您改进当前和未来的问题,从而帮助您获得更好的答案。 “告诉我如何解决这个编码问题?”与 Stack Overflow 无关。您必须诚实地尝试解决方案,然后就您的实施提出具体问题。 Stack Overflow 无意取代现有的教程和文档。 【参考方案1】:
# Get a list of unique id's
user_ids = df["user_id"].unique()

# Get the start date of each user
start_dates = [min(df[df["user_id"]==usr]["created"]) for usr in user_ids]

# We will subtract the start date to have a common baseline for all users
df["time_since_start"] = None
for i, usr in enumerate(user_ids):
    df.loc[df["user_id"]==usr,"time_since_start"] = df.loc[df["user_id"]==usr,"created"] - start_dates[i]
# we got a Timedelta object, but its more useful as a float
df['t'] = [x.value for x in df["time_since_start"]]

# get the maximum time any user has ever ..played? to make our bins
max_time = df["time_since_start"].max()
# convert it from microseconds to weeks, rounding up
max_weeks = int(np.ceil(max_time.value/8.64e+13/7))

# make the bins and add corresponding readable labels
bins = [pd.Timedelta(weeks = wk).value for wk in range(max_weeks+1)]
labels = ["week " + str(wk+1) for wk in range(max_weeks)]

# bin the data and aggregate the result
df["bin"] = pd.cut(df['t'], bins, labels = labels)
df.groupby(['user_id','bin'])['total_play_time'].sum()
user_id  bin   
1        week 1    2
         week 2    1
         week 3    0
         week 4    1
         week 5    0
         week 6    0
2        week 1    0
         week 2    2
         week 3    0
         week 4    0
         week 5    0
         week 6    0
3        week 1    2
         week 2    1
         week 3    1
         week 4    0
         week 5    0
         week 6    0
4        week 1    0
         week 2    1
         week 3    0
         week 4    0
         week 5    0
         week 6    0
5        week 1    1
         week 2    0
         week 3    0
         week 4    0
         week 5    0
         week 6    0
Name: total_play_time, dtype: int64

如果确实需要,您可以将数据框重新调整为宽格式。

【讨论】:

运行您的代码时出现以下错误。 AttributeError Traceback(最近一次调用最后) in 4 df.loc[df["user_id"]==usr,"time_since_start"] = df .loc[df["user_id"]==usr,"created"] - start_dates[i] 5 # 我们得到了一个 Timedelta 对象,但它作为浮点数更有用 ----> 6 df['t'] = [x.value for x in df["time_since_start"]] AttributeError: 'int' object has no attribute 'value' 通过复制粘贴您的问题中的代码,然后是我的代码,我无法重现此错误。 pandas==1.2.4 numpy==1.20.2 python:3.7.10 平台:ubuntu 20.04

以上是关于如何在python中查找给定日期每周的总播放时间?的主要内容,如果未能解决你的问题,请参考以下文章

如何仅查询 HealthKit 以获取给定日期的总“在床上”时间?

如何仅查询 HealthKit 以获取给定日期的总“在床上”时间?

如何在 Power Query M 中按给定日期查找 4 周滚动周期的周数

如何根据给定的年数生成每周日期

Mysql 将给定日期范围内的每日总计转化为每周总计

如何根据Python中另一列中的日期查找最频繁的值