Groupby 和 Python/R 的区别

Posted

技术标签:

【中文标题】Groupby 和 Python/R 的区别【英文标题】:Groupby and difference in Python/R 【发布时间】:2021-10-04 20:57:57 【问题描述】:

我有一个数据集如下

我想按代理列分组并获得每个代理的最大和最小解决时间之间的差异(例如,对于 Adnan Shaikh,输出将是 01:58:22)。

如何在 Python/R 中做到这一点?

【问题讨论】:

到目前为止你尝试了什么? 对于每个代理,Resolved.time 是否总是单调递增的? 【参考方案1】:

对于 python 来说是:

import numpy as np
import pandas as pd

df = pd.DataFrame(data=
    "Agent": ["Adnan Shaikh", "Adnan Shaikh", "Adnan Shaikh",
              "Akshay Padaya", "Akshay Padaya", "Akshay Padaya",
              "Akshay Padaya"],
    "Resolved.time": ["2021-07-28 12:11",
                      "2021-07-28 12:23",
                      "2021-07-28 13:06",
                      "2021-07-28 10:44",
                      "2021-07-28 12:45",
                      "2021-07-28 13:05",
                      np.nan])
df["Resolved.time"] = pd.to_datetime(df["Resolved.time"], format="%Y-%m-%d %H:%M")

result = df.groupby("Agent").agg(
    Resolved_time=("Resolved.time", lambda x: np.max(x) - np.min(x))
).reset_index()

结果是这样的:

Agent Resolved_time
0 Adnan Shaikh 0 days 00:55:00
1 Akshay Padaya 0 days 02:21:00

【讨论】:

【参考方案2】:

在 R 中,类似于:

library(tidyverse)

df <- tibble(agent = c("Adnan Shaikh", "Adnan Shaikh", "Adnan Shaikh", "Akshay Padaya", "Akshay Padaya", "Akshay Padaya", "Akshay Padaya"),
             Resolved.time =lubridate::ymd_hm(c("2021-07-28 12:11","2021-07-28 12:23", "2021-07-28 13:06", "2021-07-28 10:44", "2021-07-28 12:45", "2021-07-28 13:05", NA)))


df %>% 
  na.omit() %>% 
  group_by(agent) %>% 
  mutate(result = max(Resolved.time) - min(Resolved.time), result = lubridate::seconds_to_period(result))

给予:

# A tibble: 6 x 3
# Groups:   agent [2]
  agent         Resolved.time       result   
  <chr>         <dttm>              <Period> 
1 Adnan Shaikh  2021-07-28 12:11:00 55M 0S   
2 Adnan Shaikh  2021-07-28 12:23:00 55M 0S   
3 Adnan Shaikh  2021-07-28 13:06:00 55M 0S   
4 Akshay Padaya 2021-07-28 10:44:00 2H 21M 0S
5 Akshay Padaya 2021-07-28 12:45:00 2H 21M 0S
6 Akshay Padaya 2021-07-28 13:05:00 2H 21M 0S

【讨论】:

以上是关于Groupby 和 Python/R 的区别的主要内容,如果未能解决你的问题,请参考以下文章

创建由 Groupby 和转换产生的数据框列

如何调试熊猫 groupby 应用功能

计算 MAD(平均绝对偏差) GroupBy Pandas

05: MySQL高级查询

Python pandas groupby sum显示错误的输出

saprk的groupby和groupbykey的区别