从 pandas 长格式创建事件图
Posted
技术标签:
【中文标题】从 pandas 长格式创建事件图【英文标题】:Create eventplot from pandas long format 【发布时间】:2021-10-13 13:04:07 【问题描述】:我在将带有多个观察结果的 pandas 数据框转换为正确格式以显示事件图时遇到问题。 'creator'
列应该是用来区分数据集的标签。
import pandas as pd
from matplotlib import pyplot as plt
data =
"creator": [1, 2, 1, 1, 2],
"creationdate": ["2019-03-13 16:43:55", "2019-03-13 16:43:55", "2019-03-15 15:52:05",
"2019-03-16 15:52:05", "2019-03-17 15:52:05"]
df = pd.DataFrame(data)
df["creationdate"] = pd.to_datetime(df["creationdate"])
# df
# creator creationdate
#0 1 2019-03-13 16:43:55
#1 2 2019-03-13 16:43:55
#2 1 2019-03-15 15:52:05
#3 1 2019-03-16 15:52:05
#4 2 2019-03-17 15:52:05
# Group by creator
grouped = df.groupby("creator")
# How can the data now be reshaped to actually display the plot
# ...
# TypeError: Invalid comparison between dtype=datetime64[ns] and int
fig = plt.eventplot(grouped)
plt.show()
我尝试遍历分组数组以提取各个组,但这似乎过于复杂且不必要。
data = np.array([grouped.get_group(1)["creationdate"].to_numpy(), grouped.get_group(2)["creationdate"].to_numpy()])
【问题讨论】:
【参考方案1】: 在groupby
对象上使用enumerate
来索引颜色列表。
import pandas as pd
import matplotlib.pyplot as plt
# load data
data = 'creator': [1, 2, 1, 1, 2], 'creationdate': ['2019-03-13 16:43:55', '2019-03-13 16:43:55', '2019-03-15 15:52:05', '2019-03-16 15:52:05', '2019-03-17 15:52:05']
df = pd.DataFrame(data)
# convert column to a datetime dtype
df['creationdate'] = pd.to_datetime(df['creationdate'])
# create the fig / axes
fig, ax = plt.subplots(figsize=(10, 4))
# iterate through each group and plot
colors = ['blue', 'red']
for i, (label, data) in enumerate(df.groupby('creator')):
ax.eventplot('creationdate', colors=colors[i], data=data, label=label)
ax.legend(title='Creator', bbox_to_anchor=(1, 1.02), loc='upper left')
ax.set(xlabel='Datetime', ylabel='Value', title='Eventplot')
【讨论】:
【参考方案2】:您可以使用for key in grouped.groups.keys():
快速迭代每个组键
import pandas as pd
from matplotlib import pyplot as plt
data =
"creator": [1, 2, 1, 1, 2],
"creationdate": ["2019-03-13 16:43:55", "2019-03-13 16:43:55", "2019-03-15 15:52:05",
"2019-03-16 15:52:05", "2019-03-17 15:52:05"]
df = pd.DataFrame(data)
df["creationdate"] = pd.to_datetime(df["creationdate"])
# df
# creator creationdate
#0 1 2019-03-13 16:43:55
#1 2 2019-03-13 16:43:55
#2 1 2019-03-15 15:52:05
#3 1 2019-03-16 15:52:05
#4 2 2019-03-17 15:52:05
# Group by creator
grouped = df.groupby("creator")
print(df)
# How can the data now be reshaped to actually display the plot
# ...
# TypeError: Invalid comparison between dtype=datetime64[ns] and int
for key in grouped.groups.keys():
# Color them differently
if key == 1:
color = "blue"
elif key == 2:
color = "red"
pos = grouped.get_group(key)["creationdate"].values
fig = plt.eventplot(pos, colors=color)
plt.show()
【讨论】:
以上是关于从 pandas 长格式创建事件图的主要内容,如果未能解决你的问题,请参考以下文章
以长格式保存具有不同级别名称的多索引列 Pandas 为 excel 格式