以日期为 X 轴的 Seaborn 条形图

Posted

技术标签:

【中文标题】以日期为 X 轴的 Seaborn 条形图【英文标题】:Seaborn Bar Plot with Dates as X-Axis 【发布时间】:2020-11-20 18:41:33 【问题描述】:

我尝试创建时间序列数据集的条形图,但未成功。我尝试将日期转换为 Pandas 日期时间对象、时间戳对象、原始字符串、浮点数和整数。无论我做什么,都会收到以下错误:TypeError: float() argument must be a string or a number, not 'Timestamp' 以下是一些产生错误的最小示例:

    这里,'Date' 对象的类型是 ,所以我知道为什么这不起作用:
import matplotlib.pylab as plt
import matplotlib.dates as mdates
import seaborn as sns


def main():
    path = 'Data/AQ+RX Counts.csv'
    df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
    weekly_df = df.resample('W').mean().reset_index()
    weekly_df['count'] = df['count'].resample('W').sum().reset_index()
    sns.barplot(x = 'Date', y='count', data = weekly_df)
    plt.show()

main()
    然后我尝试让日期浮动,打算将它们格式化回之后的日期,但这仍然不起作用:
   dates = mdates.datestr2num(weekly_df.Date.astype(str))
   weekly_df['n_dates'] = dates

   sns.barplot(x = 'n_dates', y='count', data = weekly_df)
   plt.show()
    我也尝试将它们设为整数,但无济于事:
    dates = mdates.datestr2num(weekly_df.Date.astype(str))
    dates = dates.astype(int)
    dates = pd.Series(dates)
    weekly_df['n_dates'] = dates

    sns.barplot(x = 'n_dates', y='count', data = weekly_df)
    plt.show()

我尝试了许多其他变体,但都产生了相同的错误。我什至将它与其他代码进行了比较,并验证了所有类型都是相同的,并且比较代码工作正常。我完全不知道从这里去哪里。

数据框:

Date,WSA,WSV,WDV,WSM,SGT,T2M,T10M,DELTA_T,PBAR,SRAD,RH,PM25,AQI,count
2015-01-01,1.0708333333333335,0.8750000000000001,132.95833333333334,3.4708333333333337,35.39166666666667,30.72916666666667,30.625,-0.11666666666666667,738.8249999999998,72.66666666666667,99.75416666666666,24.80833333333333,73.30793131580873,0.0
2015-01-02,1.1086956521739129,0.9391304347826086,148.47826086956522,3.734782608695653,32.46521739130434,34.39130434782609,34.27826086956521,-0.11739130434782602,738.3478260869565,61.39130434782609,100.01304347826084,23.500000000000004,64.15072523318715,4.0
2015-01-03,1.0173913043478258,0.7173913043478259,168.04347826086956,3.773913043478261,42.71739130434783,36.24782608695652,36.160869565217396,-0.09565217391304348,739.4434782608695,49.60869565217392,100.76956521739132,20.460869565217394,55.65271063058384,0.0
2015-01-04,1.0,0.6,159.95833333333334,3.85,49.15,38.8875,38.66666666666666,-0.225,741.5000000000001,31.54166666666667,101.47916666666669,13.012499999999998,46.835258118800965,0.0
2015-01-05,1.0333333333333334,0.4416666666666667,137.0,4.0,57.56666666666666,42.99583333333333,42.94583333333333,-0.04999999999999995,742.5333333333333,44.58333333333334,101.00416666666666,16.654166666666665,52.420271225456766,4.0
2015-01-06,0.7818181818181817,0.5590909090909091,114.72727272727272,3.654545454545455,42.86818181818182,40.7409090909091,41.09545454545454,0.36818181818181817,740.9045454545453,48.27272727272727,100.57727272727274,21.954545454545453,67.31833852518514,6.0
2015-01-07,0.9739130434782608,0.8304347826086954,110.82608695652172,3.956521739130436,30.817391304347833,40.36521739130435,40.59565217391304,0.22173913043478266,739.8652173913043,60.04347826086956,100.19565217391305,24.456521739130434,72.3472505968891,6.0
2015-01-08,0.9833333333333336,0.8250000000000001,156.5,4.208333333333333,32.67083333333333,41.520833333333336,41.36666666666667,-0.12916666666666668,736.35,69.58333333333333,99.95833333333331,22.274999999999995,65.77072473472253,10.0
2015-01-09,0.9583333333333331,0.7291666666666669,133.70833333333334,3.3791666666666664,39.645833333333336,42.279166666666654,42.15833333333333,-0.11666666666666665,735.2041666666665,60.41666666666666,100.04166666666669,19.370833333333334,59.08512936837911,10.0
2015-01-10,0.9666666666666668,0.7583333333333336,164.5,3.675,37.34583333333333,42.96250000000001,42.775,-0.2,734.2875,41.5,100.12083333333337,14.658333333333335,49.31465266245389,0.0

【问题讨论】:

【参考方案1】: 函数中的转换是将 'count' 从浮点数转换为日期时间数据类型。 使用发布的示例数据
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

path = 'data/test.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])

# display(df)
                 WSA       WSV         WDV       WSM        SGT        T2M       T10M   DELTA_T        PBAR       SRAD          RH       PM25        AQI  count
Date                                                                                                                                                           
2015-01-01  1.070833  0.875000  132.958333  3.470833  35.391667  30.729167  30.625000 -0.116667  738.825000  72.666667   99.754167  24.808333  73.307931    0.0
2015-01-02  1.108696  0.939130  148.478261  3.734783  32.465217  34.391304  34.278261 -0.117391  738.347826  61.391304  100.013043  23.500000  64.150725    4.0
2015-01-03  1.017391  0.717391  168.043478  3.773913  42.717391  36.247826  36.160870 -0.095652  739.443478  49.608696  100.769565  20.460870  55.652711    0.0
2015-01-04  1.000000  0.600000  159.958333  3.850000  49.150000  38.887500  38.666667 -0.225000  741.500000  31.541667  101.479167  13.012500  46.835258    0.0
2015-01-05  1.033333  0.441667  137.000000  4.000000  57.566667  42.995833  42.945833 -0.050000  742.533333  44.583333  101.004167  16.654167  52.420271    4.0
2015-01-06  0.781818  0.559091  114.727273  3.654545  42.868182  40.740909  41.095455  0.368182  740.904545  48.272727  100.577273  21.954545  67.318339    6.0
2015-01-07  0.973913  0.830435  110.826087  3.956522  30.817391  40.365217  40.595652  0.221739  739.865217  60.043478  100.195652  24.456522  72.347251    6.0
2015-01-08  0.983333  0.825000  156.500000  4.208333  32.670833  41.520833  41.366667 -0.129167  736.350000  69.583333   99.958333  22.275000  65.770725   10.0
2015-01-09  0.958333  0.729167  133.708333  3.379167  39.645833  42.279167  42.158333 -0.116667  735.204167  60.416667  100.041667  19.370833  59.085129   10.0
2015-01-10  0.966667  0.758333  164.500000  3.675000  37.345833  42.962500  42.775000 -0.200000  734.287500  41.500000  100.120833  14.658333  49.314653    0.0

# resample mean
dfr = df.resample('W').mean()

# add the resampled sum to dfr
dfr['mean'] = df['count'].resample('W').sum()

# reset index
dfr = dfr.reset_index()

# display(dfr)
        Date       WSA       WSV         WDV       WSM        SGT        T2M       T10M   DELTA_T        PBAR       SRAD          RH       PM25        AQI  count  mean
0 2015-01-04  1.049230  0.782880  152.359601  3.707382  39.931069  35.063949  34.932699 -0.138678  739.529076  53.802083  100.503986  20.445426  59.986656    1.0   4.0
1 2015-01-11  0.949566  0.690615  136.210282  3.812261  40.152457  41.810743  41.822823  0.015681  738.190794  54.066590  100.316321  19.894900  61.042728    6.0  36.0

# plot dfr
fig, ax = plt.subplots(figsize=(16, 10)) 
fig = sns.barplot(x='Date', y='count', data=dfr)

# configure the xaxis ticks from datetime to date
x_dates = dfr.Date.dt.strftime('%Y-%m-%d').sort_values().unique()
ax.set_xticklabels(labels=x_dates, rotation=90, ha='right')

plt.show()

【讨论】:

是的,我的问题是在重新采样计数之前重新采样 df 时重置索引,这以某种方式使程序丢失了我的“计数”变量,然后被 NaN 和随后的时间戳索引覆盖在进一步重采样时。感谢您的帮助!

以上是关于以日期为 X 轴的 Seaborn 条形图的主要内容,如果未能解决你的问题,请参考以下文章

修改 x 轴下方凌乱和重叠的日期标签的最优雅方法? (Seaborn,条形图)

R语言ggplot2可视化强制指定X轴的日期范围实战:组合条形图和lm模型构建的回归模型的结果并强制指定X轴的日期范围

在Seaborn中绘制堆积条形图以显示聚类[重复]

条形图仅显示一个 x 值的条形图

Seaborn 条形图中 X 轴上的排序和格式化日期

Pandas Dataframe 到 Seaborn 分组条形图