以日期为 X 轴的 Seaborn 条形图
Posted
技术标签:
【中文标题】以日期为 X 轴的 Seaborn 条形图【英文标题】:Seaborn Bar Plot with Dates as X-Axis 【发布时间】:2020-11-20 18:41:33 【问题描述】:我尝试创建时间序列数据集的条形图,但未成功。我尝试将日期转换为 Pandas 日期时间对象、时间戳对象、原始字符串、浮点数和整数。无论我做什么,都会收到以下错误:TypeError: float() argument must be a string or a number, not 'Timestamp'
以下是一些产生错误的最小示例:
-
这里,'Date' 对象的类型是
import matplotlib.pylab as plt
import matplotlib.dates as mdates
import seaborn as sns
def main():
path = 'Data/AQ+RX Counts.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
weekly_df = df.resample('W').mean().reset_index()
weekly_df['count'] = df['count'].resample('W').sum().reset_index()
sns.barplot(x = 'Date', y='count', data = weekly_df)
plt.show()
main()
-
然后我尝试让日期浮动,打算将它们格式化回之后的日期,但这仍然不起作用:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
-
我也尝试将它们设为整数,但无济于事:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
dates = dates.astype(int)
dates = pd.Series(dates)
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
我尝试了许多其他变体,但都产生了相同的错误。我什至将它与其他代码进行了比较,并验证了所有类型都是相同的,并且比较代码工作正常。我完全不知道从这里去哪里。
数据框:
Date,WSA,WSV,WDV,WSM,SGT,T2M,T10M,DELTA_T,PBAR,SRAD,RH,PM25,AQI,count
2015-01-01,1.0708333333333335,0.8750000000000001,132.95833333333334,3.4708333333333337,35.39166666666667,30.72916666666667,30.625,-0.11666666666666667,738.8249999999998,72.66666666666667,99.75416666666666,24.80833333333333,73.30793131580873,0.0
2015-01-02,1.1086956521739129,0.9391304347826086,148.47826086956522,3.734782608695653,32.46521739130434,34.39130434782609,34.27826086956521,-0.11739130434782602,738.3478260869565,61.39130434782609,100.01304347826084,23.500000000000004,64.15072523318715,4.0
2015-01-03,1.0173913043478258,0.7173913043478259,168.04347826086956,3.773913043478261,42.71739130434783,36.24782608695652,36.160869565217396,-0.09565217391304348,739.4434782608695,49.60869565217392,100.76956521739132,20.460869565217394,55.65271063058384,0.0
2015-01-04,1.0,0.6,159.95833333333334,3.85,49.15,38.8875,38.66666666666666,-0.225,741.5000000000001,31.54166666666667,101.47916666666669,13.012499999999998,46.835258118800965,0.0
2015-01-05,1.0333333333333334,0.4416666666666667,137.0,4.0,57.56666666666666,42.99583333333333,42.94583333333333,-0.04999999999999995,742.5333333333333,44.58333333333334,101.00416666666666,16.654166666666665,52.420271225456766,4.0
2015-01-06,0.7818181818181817,0.5590909090909091,114.72727272727272,3.654545454545455,42.86818181818182,40.7409090909091,41.09545454545454,0.36818181818181817,740.9045454545453,48.27272727272727,100.57727272727274,21.954545454545453,67.31833852518514,6.0
2015-01-07,0.9739130434782608,0.8304347826086954,110.82608695652172,3.956521739130436,30.817391304347833,40.36521739130435,40.59565217391304,0.22173913043478266,739.8652173913043,60.04347826086956,100.19565217391305,24.456521739130434,72.3472505968891,6.0
2015-01-08,0.9833333333333336,0.8250000000000001,156.5,4.208333333333333,32.67083333333333,41.520833333333336,41.36666666666667,-0.12916666666666668,736.35,69.58333333333333,99.95833333333331,22.274999999999995,65.77072473472253,10.0
2015-01-09,0.9583333333333331,0.7291666666666669,133.70833333333334,3.3791666666666664,39.645833333333336,42.279166666666654,42.15833333333333,-0.11666666666666665,735.2041666666665,60.41666666666666,100.04166666666669,19.370833333333334,59.08512936837911,10.0
2015-01-10,0.9666666666666668,0.7583333333333336,164.5,3.675,37.34583333333333,42.96250000000001,42.775,-0.2,734.2875,41.5,100.12083333333337,14.658333333333335,49.31465266245389,0.0
【问题讨论】:
【参考方案1】: 函数中的转换是将'count'
从浮点数转换为日期时间数据类型。
使用发布的示例数据
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
path = 'data/test.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
# display(df)
WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count
Date
2015-01-01 1.070833 0.875000 132.958333 3.470833 35.391667 30.729167 30.625000 -0.116667 738.825000 72.666667 99.754167 24.808333 73.307931 0.0
2015-01-02 1.108696 0.939130 148.478261 3.734783 32.465217 34.391304 34.278261 -0.117391 738.347826 61.391304 100.013043 23.500000 64.150725 4.0
2015-01-03 1.017391 0.717391 168.043478 3.773913 42.717391 36.247826 36.160870 -0.095652 739.443478 49.608696 100.769565 20.460870 55.652711 0.0
2015-01-04 1.000000 0.600000 159.958333 3.850000 49.150000 38.887500 38.666667 -0.225000 741.500000 31.541667 101.479167 13.012500 46.835258 0.0
2015-01-05 1.033333 0.441667 137.000000 4.000000 57.566667 42.995833 42.945833 -0.050000 742.533333 44.583333 101.004167 16.654167 52.420271 4.0
2015-01-06 0.781818 0.559091 114.727273 3.654545 42.868182 40.740909 41.095455 0.368182 740.904545 48.272727 100.577273 21.954545 67.318339 6.0
2015-01-07 0.973913 0.830435 110.826087 3.956522 30.817391 40.365217 40.595652 0.221739 739.865217 60.043478 100.195652 24.456522 72.347251 6.0
2015-01-08 0.983333 0.825000 156.500000 4.208333 32.670833 41.520833 41.366667 -0.129167 736.350000 69.583333 99.958333 22.275000 65.770725 10.0
2015-01-09 0.958333 0.729167 133.708333 3.379167 39.645833 42.279167 42.158333 -0.116667 735.204167 60.416667 100.041667 19.370833 59.085129 10.0
2015-01-10 0.966667 0.758333 164.500000 3.675000 37.345833 42.962500 42.775000 -0.200000 734.287500 41.500000 100.120833 14.658333 49.314653 0.0
# resample mean
dfr = df.resample('W').mean()
# add the resampled sum to dfr
dfr['mean'] = df['count'].resample('W').sum()
# reset index
dfr = dfr.reset_index()
# display(dfr)
Date WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count mean
0 2015-01-04 1.049230 0.782880 152.359601 3.707382 39.931069 35.063949 34.932699 -0.138678 739.529076 53.802083 100.503986 20.445426 59.986656 1.0 4.0
1 2015-01-11 0.949566 0.690615 136.210282 3.812261 40.152457 41.810743 41.822823 0.015681 738.190794 54.066590 100.316321 19.894900 61.042728 6.0 36.0
# plot dfr
fig, ax = plt.subplots(figsize=(16, 10))
fig = sns.barplot(x='Date', y='count', data=dfr)
# configure the xaxis ticks from datetime to date
x_dates = dfr.Date.dt.strftime('%Y-%m-%d').sort_values().unique()
ax.set_xticklabels(labels=x_dates, rotation=90, ha='right')
plt.show()
【讨论】:
是的,我的问题是在重新采样计数之前重新采样 df 时重置索引,这以某种方式使程序丢失了我的“计数”变量,然后被 NaN 和随后的时间戳索引覆盖在进一步重采样时。感谢您的帮助!以上是关于以日期为 X 轴的 Seaborn 条形图的主要内容,如果未能解决你的问题,请参考以下文章
修改 x 轴下方凌乱和重叠的日期标签的最优雅方法? (Seaborn,条形图)