有啥方法可以正确聚合时间序列数据以使用 matplotlib/seaborn 制作散点图?

Posted

技术标签:

【中文标题】有啥方法可以正确聚合时间序列数据以使用 matplotlib/seaborn 制作散点图?【英文标题】:Any way to correctly aggregate time series data to make scatter plot with matplotlib/seaborn?有什么方法可以正确聚合时间序列数据以使用 matplotlib/seaborn 制作散点图? 【发布时间】:2020-11-24 03:01:59 【问题描述】:

我想为我的时间序列数据制作时间序列散点图,其中我的数据有分类列,需要按组聚合以首先绘制数据,然后使用seabornmatplotlib 制作散点图。我的数据是产品销售价格时间序列数据,我想看到每个产品所有者在不同市场门槛上的价格趋势。我尝试使用pandas.pivot_tablegroupby 来塑造绘图数据,但无法获得想要的绘图。

可重复的数据

这是我使用的example product data;我想在哪里看到每个经销商关于threshold的不同蛋白质类型的价格趋势。

我的尝试

这是我目前尝试聚合我的数据以制作绘图数据,但它没有给出我正确的绘图。我敢打赌,我汇总绘图数据的方式不正确。谁能指出我如何正确地获得所需的情节?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn

mydf = pd.read_csv('foo.csv')
mydf=mydf.drop(mydf.columns[0], axis=1)
mydf['expected_price'] = mydf['price']*76/mydf['threshold']

g = mydf.groupby(['dealer','protein_type'])
newdf= g.apply(lambda x: pd.Series([np.average(x['threshold'])])).unstack()

但上述尝试不起作用,因为我希望在每日时间序列中以不同的protein_type 和不同的threshold 绘制每个经销商的市场购买价格的数据。我不知道处理这个时间序列的最佳方法是什么。谁能建议我或纠正我如何做到这一点?

我也尝试pandas/pivot_table 来聚合我的数据,但它仍然不能代表绘图数据。

pv_df= pd.pivot_table(mydf, index=['date'], columns=['dealer', 'protein_type', 'threshold'],values=['price'])
pv_df= pv_df.fillna(0)
pv_df.groupby(['dealer', 'protein_type', 'threshold'])['price'].unstack().reset_index()

但上述尝试仍然无效。同样在我的数据中,日期不是连续的,所以我假设我可以绘制每月时间序列折线图。

我的情节尝试

这是我制作情节的尝试:

def scatterplot(x_data, y_data, x_label, y_label, title):
    fig, ax = plt.subplots()
    ax.scatter(x_data, y_data, s = 30, color = '#539caf', alpha = 0.75)

    ax.set_title(title)
    ax.set_xlabel(x_label)
    ax.set_ylabel(y_label)
    fig.autofmt_xdate()

想要的输出

我想要折线图或散点图,其中 x 轴显示每月时间序列,而 y 轴显示每个不同 protein_type 在不同 threshold 值上每个不同经销商沿每月时间序列的价格。这是我想要的可能的折线图示例:

【问题讨论】:

【参考方案1】:

更新为threshold

选项 1

这个选项是在看到选项1的结果后实施的。 图中有很多无法解释的信息,它们没有清楚地呈现数据 为了清楚地呈现数据,每个图应仅包含 3 个维度的数据(例如 datevaluescats),分别为一个 dealer、一个 threshold 和一个 protein_type
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import timedelta

# read the data in and parse the date column and set threshold as a str
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# calculate expected price
df['expected_price'] = df.price*76/df.threshold

# set threshold as a category
df.threshold = df.threshold.astype('category')

# set the index
df = df.set_index(['date', 'dealer', 'protein_type', 'threshold'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination', 'quantity']).stack().reset_index().rename(columns='level_4': 'cats', 0: 'values')

# plot
for pt in dfl.protein_type.unique():
    for t in dfl.threshold.unique():
        data = dfl[(dfl.protein_type == pt) & (dfl.threshold == t)]
        if not data.empty:
            utc = len(data.threshold.unique())
            f, axes = plt.subplots(nrows=utc, ncols= 2, figsize=(20, 4), squeeze=False)
            for j in range(utc):
                for i, d in enumerate(dfl.dealer.unique()):
                    data_d = data[data.dealer == d].sort_values(['cats', 'date']).reset_index(drop=True)
                    p = sns.scatterplot('date', 'values', data=data_d, hue='cats', ax=axes[j, i])
                    if not data_d.empty:
                        p.set_title(f'd\nThreshold: t\npt')
                        p.set_xlim(data_d.date.min() - timedelta(days=60), data_d.date.max() + timedelta(days=60))
                    else:
                        p.set_title(f'd: No Data Available\nThreshold: t\npt')
                    
            plt.show()

前四个地块

选项 2

这会产生 4 个单独的数字,其中 thresholdcategory 类型。 threshold 必须先保留为 int 以供 expected_price 计算,然后再进行转换。 请注意,我的数据没有多余的未命名列,因此仍需要删除它,以下代码中未显示。
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read the data in and parse the date column and set threshold as a str
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# calculate expected price
df['expected_price'] = df.price*76/df.threshold

# set threshold as a category
df.threshold = df.threshold.astype('category')

# set the index
df = df.set_index(['date', 'dealer', 'protein_type', 'threshold'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination', 'quantity']).stack().reset_index().rename(columns='level_4': 'cats', 0: 'values')

# plot four plots with threshold
for d in dfl.dealer.unique():
    for pt in dfl.protein_type.unique():
        plt.figure(figsize=(13, 7))
        data = dfl[(dfl.protein_type == pt) & (dfl.dealer == d)]
        sns.lineplot('date', 'values', data=data, hue='threshold', style='cats')
        plt.yscale('log')
        plt.title(f'd: pt')
        plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

原来没有threshold作为分类

我不明白你在做什么: newdf= g.apply(lambda x: pd.Series([np.average(x['threshold'])])).unstack() 我认为这不是绘制数据的主要问题的组成部分 首先需要将数据帧转换为长格式,并删除'destination' 在一个图形上绘制的维度太多 x='date', y='values', hue='cats', style='dealer' 'protein_type'需要有单独的图 但是,包含'dealer' 的数据重叠很多,难以阅读,因此需要 4 个绘图。

数据帧设置:

请注意,我的数据没有多余的未命名列,因此仍需要删除它,以下代码中未显示。 使用pandas.DataFrame.stack 将数据帧转换为长格式

选项 1:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read the data in
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])

# your calculation
df['expected_price'] = df['price']*76/df['threshold']

# set the index
df = df.set_index(['date', 'dealer', 'protein_type'])

# form the dataframe into a long form
dfl = df.drop(columns=['destination']).stack().reset_index().rename(columns='level_3': 'cats', 0: 'values')

# display(dfl.head())
        date            dealer protein_type            cats    values
0 2001-12-22  Alpha Food Corps      chicken       threshold     50.00
1 2001-12-22  Alpha Food Corps      chicken        quantity  39037.00
2 2001-12-22  Alpha Food Corps      chicken           price      0.50
3 2001-12-22  Alpha Food Corps      chicken  expected_price      0.76
4 2001-12-27  Alpha Food Corps         beef       threshold     85.00

选项 2:滚动平均值

pandas.DataFrame.groupbypandas.DataFrame.rolling mean 然后 .stack
df = pd.read_csv('data/so_data/2020-08-03 63239708/mydf.csv', parse_dates=['date'])
df['expected_price'] = df['price']*76/df['threshold']
df = df.set_index('date')

# groupby aggregate rolling mean and stack
dfl = df.groupby(['dealer', 'protein_type'])[['expected_price', 'price']].rolling(7).mean().stack().reset_index().rename(columns='level_3': 'cats', 0: 'values')

选项 1:两个图

'dealer' 数据类似被区分(价格勾结谁?)
for pt in dfl.protein_type.unique():
    plt.figure(figsize=(9, 5))
    data = dfl[dfl.protein_type == pt]
    sns.lineplot('date', 'values', data=data, hue='cats', style='dealer')
    plt.xlim(datetime(2001, 11, 1), datetime(2004, 8, 1))
    plt.yscale('log')
    plt.title(pt)
    plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

即使只有'price''expected_price',也无法确定'dealer'

选项 2:四个地块

seaborn.FacetGrid

g = sns.FacetGrid(data=dfl, col='dealer', row='protein_type', hue='cats', height=5, aspect=1.5)
g.map(sns.lineplot, 'date', 'values').add_legend()
plt.yscale('log')
g.set_xticklabels(rotation=90)

滚动平均值的数据图

嵌套循环

这将产生一列 4 个数字,首先选择 dealer,然后选择 protein_type。 (可选)交换dealerprotein 的顺序
for d in dfl.dealer.unique():
    for pt in dfl.protein_type.unique():
        plt.figure(figsize=(10, 5))
        data = dfl[(dfl.protein_type == pt) & (dfl.dealer == d)]
        sns.lineplot('date', 'values', data=data, hue='cats')
        plt.xlim(datetime(2001, 11, 1), datetime(2004, 8, 1))
        plt.yscale('log')
        plt.title(f'd: pt')
        plt.legend(bbox_to_anchor=(1.04,0.5), loc="center left", borderaxespad=0)

CSV 示例:

date,dealer,threshold,quantity,price,protein_type,destination
2001-12-22,Alpha Food Corps,50,39037,0.5,chicken,UK
2001-12-27,Alpha Food Corps,85,35432,1.8,beef,UK
2001-12-29,Alpha Food Corps,50,32142,0.5,chicken,UK
2001-12-30,Alpha Food Corps,85,34516,1.8,beef,UK
2002-01-02,Alpha Food Corps,85,39930,1.8,beef,UK
2002-01-04,Alpha Food Corps,85,40709,1.8,beef,UK
2002-01-08,Alpha Food Corps,94,37641,2.2,beef,UK
2002-01-08,Alpha Food Corps,85,37545,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,37564,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,37607,1.8,beef,UK
2002-01-08,Alpha Food Corps,85,41706,1.8,beef,UK
2002-01-08,Alpha Food Corps,90,41628,2.1,beef,UK
2002-01-08,Alpha Food Corps,65,35720,0.9,chicken,UK
2002-01-09,Alpha Food Corps,94,1581,2.2,beef,UK
2002-01-09,Alpha Food Corps,85,11426,1.8,beef,UK
2002-01-09,Alpha Food Corps,85,37489,1.8,beef,UK
2002-01-09,Alpha Food Corps,90,15630,2.1,beef,UK
2002-01-09,Alpha Food Corps,80,3136,1.6,beef,UK
2002-01-10,Alpha Food Corps,85,41919,1.8,beef,UK
2002-01-10,Alpha Food Corps,90,39932,2.1,beef,UK
2002-01-10,Alpha Food Corps,90,41665,2.1,beef,UK
2002-01-10,Alpha Food Corps,90,41860,2.1,beef,UK
2002-01-10,Alpha Food Corps,65,39879,0.9,chicken,UK
2002-01-10,Alpha Food Corps,65,39884,0.9,chicken,UK
2002-01-11,Alpha Food Corps,90,37613,2.1,beef,UK
2002-01-12,Alpha Food Corps,90,41855,2.1,beef,UK
2002-01-13,Alpha Food Corps,90,37585,2.1,beef,UK
2002-01-15,Alpha Food Corps,85,41618,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41721,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41869,1.8,beef,UK
2002-01-15,Alpha Food Corps,85,41990,1.8,beef,UK
2002-01-15,Alpha Food Corps,90,41744,2.1,beef,UK
2002-01-15,Alpha Food Corps,90,41936,2.1,beef,UK
2002-01-15,Alpha Food Corps,65,41684,1.0,chicken,UK
2002-01-15,Alpha Food Corps,65,41776,1.0,chicken,UK
2002-01-16,Alpha Food Corps,94,35891,2.2,beef,UK
2002-01-16,Alpha Food Corps,85,39985,1.8,beef,UK
2002-01-16,Alpha Food Corps,85,41754,1.8,beef,UK
2002-01-16,Alpha Food Corps,85,41811,1.8,beef,UK
2002-01-16,Alpha Food Corps,90,39838,2.1,beef,UK
2002-01-16,Alpha Food Corps,80,3244,1.7,beef,UK
2002-01-17,Alpha Food Corps,94,22245,2.2,beef,UK
2002-01-17,Alpha Food Corps,85,5186,1.8,beef,UK
2002-01-17,Alpha Food Corps,90,2016,2.1,beef,UK
2002-01-17,Alpha Food Corps,90,40875,2.1,beef,UK
2002-01-17,Alpha Food Corps,65,41440,1.0,chicken,UK
2002-01-18,Alpha Food Corps,94,12525,2.2,beef,UK
2002-01-18,Alpha Food Corps,94,31325,2.2,beef,UK
2002-01-18,Alpha Food Corps,85,15486,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,29992,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,39938,1.8,beef,UK
2002-01-18,Alpha Food Corps,85,41777,1.8,beef,UK
2002-01-18,Alpha Food Corps,90,9475,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,9960,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,41676,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,41816,2.1,beef,UK
2002-01-18,Alpha Food Corps,90,42036,2.1,beef,UK
2002-01-18,Alpha Food Corps,65,41673,1.0,chicken,UK
2002-01-19,Alpha Food Corps,85,19961,1.8,beef,UK
2002-01-19,Alpha Food Corps,90,19955,2.1,beef,UK
2002-01-19,Alpha Food Corps,90,40437,2.1,beef,UK
2002-01-19,Alpha Food Corps,65,41574,1.0,chicken,UK
2002-01-19,Alpha Food Corps,65,41700,1.0,chicken,UK
2002-01-20,Alpha Food Corps,94,23278,2.2,beef,UK
2002-01-20,Alpha Food Corps,85,9230,1.8,beef,UK
2002-01-20,Alpha Food Corps,85,38842,1.8,beef,UK
2002-01-20,Alpha Food Corps,90,9173,2.1,beef,UK
2002-01-20,Alpha Food Corps,90,38608,2.1,beef,UK
2002-01-20,Alpha Food Corps,50,39191,0.8,chicken,UK
2002-01-22,Alpha Food Corps,94,41741,2.2,beef,UK
2002-01-22,Alpha Food Corps,85,39879,1.8,beef,UK
2002-01-22,Alpha Food Corps,85,41683,1.8,beef,UK
2002-01-22,Alpha Food Corps,85,41958,1.8,beef,UK
2002-01-22,Alpha Food Corps,90,41833,2.1,beef,UK
2002-01-23,Alpha Food Corps,94,20294,2.2,beef,UK
2002-01-23,Alpha Food Corps,85,15553,1.8,beef,UK
2002-01-23,Alpha Food Corps,85,40753,1.8,beef,UK
2002-01-23,Alpha Food Corps,85,41740,1.8,beef,UK
2002-01-23,Alpha Food Corps,90,1892,2.1,beef,UK
2002-01-23,Alpha Food Corps,90,39850,2.1,beef,UK
2002-01-23,Alpha Food Corps,80,3231,1.7,beef,UK
2002-01-23,Alpha Food Corps,65,41415,1.1,chicken,UK
2002-01-24,Alpha Food Corps,90,35473,2.1,beef,UK
2002-01-24,Alpha Food Corps,90,41824,2.1,beef,UK
2002-01-24,Alpha Food Corps,65,41721,1.1,chicken,UK
2002-01-25,Alpha Food Corps,85,19983,1.8,beef,UK
2002-01-25,Alpha Food Corps,85,35823,1.8,beef,UK
2002-01-25,Alpha Food Corps,90,19949,2.1,beef,UK
2002-01-25,Alpha Food Corps,90,41800,2.1,beef,UK
2002-01-25,Alpha Food Corps,65,40990,1.1,chicken,UK
2002-01-26,Alpha Food Corps,90,39938,2.1,beef,UK
2002-01-26,Alpha Food Corps,90,40641,2.1,beef,UK
2002-01-26,Alpha Food Corps,90,41550,2.1,beef,UK
2002-01-27,Alpha Food Corps,94,16589,2.2,beef,UK
2002-01-27,Alpha Food Corps,85,11669,1.8,beef,UK
2002-01-27,Alpha Food Corps,90,24982,2.1,beef,UK
2002-01-27,Alpha Food Corps,65,29819,1.1,chicken,UK
2002-01-29,Alpha Food Corps,94,37516,2.2,beef,UK
2002-01-29,Alpha Food Corps,85,37378,1.8,beef,UK
2002-01-29,Alpha Food Corps,85,37535,1.8,beef,UK
2002-01-29,Alpha Food Corps,85,40174,1.8,beef,UK
2002-01-29,Alpha Food Corps,90,37831,2.1,beef,UK
2002-01-30,Alpha Food Corps,94,34435,2.2,beef,UK
2002-01-30,Alpha Food Corps,94,39640,2.2,beef,UK
2002-01-30,Alpha Food Corps,85,1619,1.8,beef,UK
2002-01-30,Alpha Food Corps,85,3058,1.8,beef,UK
2002-01-30,Alpha Food Corps,85,20929,1.8,beef,UK
2002-01-30,Alpha Food Corps,90,3641,2.1,beef,UK
2002-01-30,Alpha Food Corps,90,20974,2.1,beef,UK
2002-01-30,Alpha Food Corps,90,31160,2.1,beef,UK
2002-01-30,Alpha Food Corps,92,38189,2.3,beef,UK
2002-01-31,Alpha Food Corps,94,8804,2.2,beef,UK
2002-01-31,Alpha Food Corps,85,17398,1.8,beef,UK
2002-01-31,Alpha Food Corps,90,13963,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,37673,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,40330,2.1,beef,UK
2002-01-31,Alpha Food Corps,90,40511,2.2,beef,UK
2002-01-31,Alpha Food Corps,80,38290,1.9,beef,UK
2002-01-31,Alpha Food Corps,92,37193,2.3,beef,UK
2002-02-01,Alpha Food Corps,94,5011,2.2,beef,UK
2002-02-01,Alpha Food Corps,85,18783,1.8,beef,UK
2002-02-01,Alpha Food Corps,85,41827,1.8,beef,UK
2002-02-01,Alpha Food Corps,90,16394,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,23013,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,39923,2.1,beef,UK
2002-02-01,Alpha Food Corps,90,41417,2.1,beef,UK
2002-02-01,Alpha Food Corps,80,15592,1.7,beef,UK
2002-02-01,Alpha Food Corps,80,38364,1.9,beef,UK
2002-02-01,Alpha Food Corps,92,37605,2.3,beef,UK
2002-02-01,Alpha Food Corps,92,39234,2.3,beef,UK
2002-02-02,Alpha Food Corps,90,34578,2.1,beef,UK
2002-02-02,Alpha Food Corps,90,41661,2.1,beef,UK
2002-02-02,Alpha Food Corps,80,3157,1.7,beef,UK
2002-02-02,Alpha Food Corps,65,41272,1.2,chicken,UK
2002-02-02,Alpha Food Corps,65,41503,1.2,chicken,UK
2002-02-02,Alpha Food Corps,92,36207,2.3,beef,UK
2002-02-05,Alpha Food Corps,94,41559,2.2,beef,UK
2002-02-05,Alpha Food Corps,85,41549,1.8,beef,UK
2002-02-05,Alpha Food Corps,85,41753,1.8,beef,UK
2002-02-05,Alpha Food Corps,85,41908,1.8,beef,UK
2002-02-05,Alpha Food Corps,90,39813,2.1,beef,UK
2002-02-05,Alpha Food Corps,90,41526,2.1,beef,UK
2002-02-05,German Food Corps,80,36031,1.9,beef,UK
2002-02-05,German Food Corps,50,38538,0.9,chicken,UK
2002-02-05,Alpha Food Corps,50,38772,0.9,chicken,UK
2002-02-05,German Food Corps,50,39099,0.9,chicken,UK
2002-02-05,German Food Corps,50,39132,0.9,chicken,UK
2002-02-05,German Food Corps,50,39207,0.9,chicken,UK
2002-02-06,Alpha Food Corps,85,41947,1.8,beef,UK
2002-02-06,German Food Corps,80,37287,1.9,beef,UK
2002-02-06,Alpha Food Corps,89,43201,2.1,beef,UK
2002-02-06,German Food Corps,50,38553,0.9,chicken,UK
2002-02-06,German Food Corps,50,38837,0.9,chicken,UK
2002-02-06,Alpha Food Corps,50,38985,0.9,chicken,UK
2002-02-06,German Food Corps,65,40386,1.4,chicken,UK
2002-02-06,Alpha Food Corps,65,41851,1.2,chicken,UK
2002-02-06,Alpha Food Corps,92,38405,2.3,beef,UK
2002-02-06,German Food Corps,73,37731,1.5,chicken,UK
2002-02-07,Alpha Food Corps,85,41097,1.9,beef,UK
2002-02-07,Alpha Food Corps,90,39582,2.1,beef,UK
2002-02-07,German Food Corps,65,38832,1.4,chicken,UK
2002-02-07,German Food Corps,50,39269,0.9,chicken,UK
2002-02-07,German Food Corps,50,40129,0.9,chicken,UK
2002-02-07,German Food Corps,50,41124,0.8,chicken,UK
2002-02-07,German Food Corps,65,41739,1.2,chicken,UK
2002-02-08,Alpha Food Corps,85,20034,1.8,beef,UK
2002-02-08,German Food Corps,85,33503,1.9,beef,UK
2002-02-08,German Food Corps,85,40780,1.9,beef,UK
2002-02-08,Alpha Food Corps,90,19913,2.1,beef,UK
2002-02-08,Alpha Food Corps,90,36682,2.1,beef,UK
2002-02-08,Alpha Food Corps,90,41624,2.1,beef,UK
2002-02-08,German Food Corps,65,37503,1.4,chicken,UK
2002-02-08,German Food Corps,50,38973,0.9,chicken,UK
2002-02-08,German Food Corps,50,39069,0.9,chicken,UK
2002-02-08,German Food Corps,50,40697,0.9,chicken,UK
2002-02-08,German Food Corps,92,36103,2.3,beef,UK
2002-02-08,Alpha Food Corps,92,38278,2.3,beef,UK
2002-02-09,Alpha Food Corps,90,39842,2.1,beef,UK
2002-02-09,Alpha Food Corps,90,16553,2.3,beef,UK
2002-02-09,Alpha Food Corps,80,18739,1.9,beef,UK
2002-02-09,German Food Corps,80,36349,1.9,beef,UK
2002-02-09,German Food Corps,65,35238,1.4,chicken,UK
2002-02-09,German Food Corps,50,38391,0.9,chicken,UK
2002-02-09,Alpha Food Corps,50,38819,0.9,chicken,UK
2002-02-09,German Food Corps,50,41691,0.9,chicken,UK
2002-02-09,Alpha Food Corps,92,40245,2.3,beef,UK
2002-02-09,German Food Corps,73,37323,1.5,chicken,UK
2002-02-09,German Food Corps,90,40312,2.2,beef,UK
2002-02-10,Alpha Food Corps,90,42108,2.1,beef,UK
2002-02-10,German Food Corps,65,37831,1.4,chicken,UK
2002-02-11,Alpha Food Corps,50,38591,0.9,chicken,UK
2002-02-12,Alpha Food Corps,94,41559,2.3,beef,UK
2002-02-12,Alpha Food Corps,85,40968,1.8,beef,UK
2002-02-12,Alpha Food Corps,85,41985,1.8,beef,UK
2002-02-12,German Food Corps,50,38931,0.9,chicken,UK
2002-02-12,German Food Corps,50,38986,0.9,chicken,UK
2002-02-12,German Food Corps,92,39684,2.3,beef,UK
2002-02-12,German Food Corps,73,36619,1.5,chicken,UK
2002-02-13,Alpha Food Corps,85,41291,1.8,beef,UK
2002-02-13,Alpha Food Corps,85,41892,1.8,beef,UK

【讨论】:

【参考方案2】:

据我所知,在线图中,您只能表示 4 维:

x 轴,您可以将其用于date y 轴,您可以将其用于pricehue,您可以将它用于thresholdstyle,您可以将它用于dealer

但您想考虑第 5 个维度:protein_type。为此,我建议使用如下代码中的子图:

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)

# convert 'date' type to datetime and sort values by threshold, then by date
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = 1,
                       figsize = (10, 10),
                       sharex = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # filter dataframe
    df_filtered = mydf[mydf['protein_type'] == protein_type]

    # set up plot
    sns.lineplot(ax = ax[i],
                 data = df_filtered,
                 x = 'date',
                 y = 'price',
                 hue = 'threshold',
                 style = 'dealer',
                 legend = 'full',
                 ci = False)

    # set up subplot title and legend
    ax[i].set_title(f'Protein type = protein_type')
    ax[i].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.85,
                    bottom = 0.05,
                    left = 0.05,
                    hspace = 0.15)

# show the plot
plt.show()


在上面的情节中可能很难看出经销商之间的差异,因此您可以在另一个子情节网格中将它们分开,如下面的代码:

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)

# convert 'date' type to datetime and sort values by threshold, then by date
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold, one column for each dealer
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = len(mydf['dealer'].unique()),
                       figsize = (10, 10),
                       sharex = True,
                       sharey = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # loop over dealer
    for j, dealer in enumerate(mydf['dealer'].unique(), 0):

        # filter dataframe
        df_filtered = mydf[(mydf['protein_type'] == protein_type) & (mydf['dealer'] == dealer)]

        # set up plot
        sns.lineplot(ax = ax[i, j],
                     data = df_filtered,
                     x = 'date',
                     y = 'price',
                     hue = 'threshold',
                     legend = 'full',
                     ci = False)

        # set up subplot title and legend
        ax[i, j].set_title(f'Protein type = protein_type | Dealer = dealer')
        ax[i, j].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.9,
                    bottom = 0.05,
                    left = 0.05,
                    wspace = 0.3,
                    hspace = 0.2)

# show the plot
plt.show()


最后,如果您想比较priceexpected_price,您可以使用style 维度来完成此任务。 这需要对数据框进行不同的聚合:您必须堆叠 priceexpected_price 列在唯一的列中。您可以使用 pd.melt 方法执行此操作。 检查下面的代码作为参考:

# import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# read dataframe
mydf = pd.read_csv('foo.csv')
mydf = mydf.drop(mydf.columns[0], axis = 1)
mydf['expected_price'] = mydf['price']*76/mydf['threshold']

# convert 'date' type to datetime
mydf['date'] = pd.to_datetime(mydf['date'], format = '%m/%d/%Y')
mydf['threshold'] = mydf['threshold'].astype('category')

# reshape dataframe
mydf = pd.melt(frame = mydf,
               id_vars = ['date', 'dealer', 'threshold', 'quantity', 'protein_type', 'destination'],
               value_vars = ['price', 'expected_price'],
               var_name = 'price type',
               value_name = 'price value')

# sort values by threshold, then by date
mydf.sort_values(['threshold', 'date'], inplace = True)

# set up subplots layout, one row for each threshold, one column for each dealer
fig, ax = plt.subplots(nrows = len(mydf['protein_type'].unique()),
                       ncols = len(mydf['dealer'].unique()),
                       figsize = (10, 10),
                       sharex = True,
                       sharey = True)

# loop over protein_type
for i, protein_type in enumerate(mydf['protein_type'].unique(), 0):

    # loop over dealer
    for j, dealer in enumerate(mydf['dealer'].unique(), 0):

        # filter dataframe
        df_filtered = mydf[(mydf['protein_type'] == protein_type) & (mydf['dealer'] == dealer)]

        # set up plot
        sns.lineplot(ax = ax[i, j],
                     data = df_filtered,
                     x = 'date',
                     y = 'price value',
                     hue = 'threshold',
                     style = 'price type',
                     legend = 'full',
                     ci = False)

        # set up subplot title and legend
        ax[i, j].set_title(f'Protein type = protein_type | Dealer = dealer')
        ax[i, j].legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

# adjust general layout
plt.subplots_adjust(top = 0.95,
                    right = 0.9,
                    bottom = 0.05,
                    left = 0.05,
                    wspace = 0.3,
                    hspace = 0.2)

# show the plot
plt.show()

【讨论】:

以上是关于有啥方法可以正确聚合时间序列数据以使用 matplotlib/seaborn 制作散点图?的主要内容,如果未能解决你的问题,请参考以下文章

没有聚合功能的 GROUP BY 子句有啥原因吗?

有啥方法可以使用 HttpUrlConnection 正确获取上传进度

有啥方法可以在生产环境中使用数据集调试应用程序?

大数据测试方法

如何以正确的方式编写代码内注释和文档?这有啥标准吗? [关闭]

Django 1.8 迁移:有啥方法可以从不再有模型的数据库表中获取数据?