大熊猫中日期时间列的日期级别的groupby聚合

Posted 2023-03-12

技术标签:

【中文标题】大熊猫中日期时间列的日期级别的groupby聚合【英文标题】：groupby aggregation on day level of date time column in pandas 【发布时间】：2020-07-18 08:21:03 【问题描述】：

我有一个如下所示的数据框。这是一个医生预约数据。

  Doctor     Appointment              Show
  A          2020-01-18 12:00:00      Yes
  A          2020-01-18 12:30:00      Yes
  A          2020-01-18 13:00:00      No
  A          2020-01-18 13:30:00      Yes
  B          2020-01-18 12:00:00      Yes
  B          2020-01-18 12:30:00      Yes
  B          2020-01-18 13:00:00      No
  B          2020-01-18 13:30:00      Yes
  B          2020-01-18 16:00:00      No
  B          2020-01-18 16:30:00      Yes
  A          2020-01-19 12:00:00      Yes
  A          2020-01-19 12:30:00      Yes
  A          2020-01-19 13:00:00      No
  A          2020-01-19 13:30:00      Yes
  A          2020-01-19 14:00:00      Yes
  A          2020-01-19 14:30:00      No
  A          2020-01-19 16:00:00      No
  A          2020-01-19 16:30:00      Yes
  B          2020-01-19 12:00:00      Yes
  B          2020-01-19 12:30:00      Yes
  B          2020-01-19 13:00:00      No
  B          2020-01-19 13:30:00      Yes
  B          2020-01-19 14:00:00      No
  B          2020-01-19 14:30:00      Yes
  B          2020-01-19 15:00:00      No
  B          2020-01-18 15:30:00      Yes

从上面的数据框中，我想在 pandas 中创建一个函数，它将输出以下内容。

我在下面尝试过

def Doctor_date_summary(doctor, date):
   Number of slots = df.groupby([doctor, date] ).sum()

预期输出：

Doctor_date_summary(Doctor, date)
If Doctor = A, date = 2020-01-19

Number of slots = 8
Number of show up = 5
show up percentage = 62.5

该医生在该日期的显示列中是的数量 = 5

【问题讨论】：

一个问题 - 您是否需要像我的问题一样计算所有数据，然后按日期和医生选择？还是只需要选择一些值并像另一个问题一样计算？只需要选择一些值并像另一个一样计数。不是所有的只有一些被选中 【参考方案1】：

您可以在函数中单独创建每个掩码，然后按位链接& AND 和 sum 进行计数：

df['Appointment'] = pd.to_datetime(df['Appointment'])

def Doctor_date_summary(doctor, date):
    m1 = df['Doctor'] == doctor
    m2 = df['Appointment'].dt.normalize() == date
    m3 = df['Show'] == 'Yes'
    show_up = (m1 & m2 & m3).sum()
    no = (m1 & m2).sum()
    return show_up, no

up, no = Doctor_date_summary('A', '2020-01-19')

最后一个输出使用f-strings:

print(f"Number of slots = up")
print(f"Number of show up = no")
print(f"show up percentage = up/no*100")
Number of slots = 5
Number of show up = 8
show up percentage = 62.5

【讨论】：

【参考方案2】：

您可以先从here 创建一个日期列：

df['day'] = df['Appointment'].dt.floor('d')

然后你可以使用布尔索引：

def Doctor_date_summary(Doctor, date):
    number_of_show_up = np.sum((df['Doctor']==Doctor) & (df['day']==date) & (df['Show']=='Yes'))
    number_of_slots = np.sum((df['Doctor']==Doctor) & (df['day']==date))

    return number_of_show_up, number_of_slots, 100*number_of_show_up/number_of_slots

最后：

number_of_show_up, number_of_slots, percentage = Doctor_date_summary('A', '2020-01-19')

print("Number of slots = ".format(number_of_slots))
print("Number of show up = ".format(number_of_show_up))
print("show up percentage = :.1f".format(percentage))

Number of slots = 8
Number of show up = 5
show up percentage = 62.5

【讨论】：

以上是关于大熊猫中日期时间列的日期级别的groupby聚合的主要内容，如果未能解决你的问题，请参考以下文章