具有匹配列的多个数据集的相关矩阵热图

Posted

技术标签:

【中文标题】具有匹配列的多个数据集的相关矩阵热图【英文标题】:Correlation matrix heatmap with multiple datasets that have matching columns 【发布时间】:2017-04-22 12:58:28 【问题描述】:

如果我们有三个数据集:

X = pd.DataFrame("t":[1,2,3,4,5],"A":[34,12,78,84,26], "B":[54,87,35,25,82], "C":[56,78,0,14,13], "D":[0,23,72,56,14], "E":[78,12,31,0,34])
Y = pd.DataFrame("t":[1,2,3,4,5],"A":[45,24,65,65,65], "B":[45,87,65,52,12], "C":[98,52,32,32,12], "D":[0,23,1,365,53], "E":[24,12,65,3,65])
Z = pd.DataFrame("t":[1,2,3,4,5],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85])

其中“t”是一个索引。

如何输出类似于 seaborn 示例的相关矩阵热图:

只是轴看起来像这样:

【问题讨论】:

【参考方案1】:
X = pd.DataFrame("t":[1,2,3,4,5],"A":[34,12,78,84,26], "B":[54,87,35,25,82], "C":[56,78,0,14,13], "D":[0,23,72,56,14], "E":[78,12,31,0,34])
Y = pd.DataFrame("t":[1,2,3,4,5],"A":[45,24,65,65,65], "B":[45,87,65,52,12], "C":[98,52,32,32,12], "D":[0,23,1,365,53], "E":[24,12,65,3,65])
Z = pd.DataFrame("t":[1,2,3,4,5],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85])


catted = pd.concat([d.set_index('t') for d in [X, Y, Z]], axis=1, keys=['X', 'Y', 'Z'])
catted = catted.rename_axis(['Source', 'Column'], axis=1)

corrmat = catted.corr()

f, ax = plt.subplots()

sns.heatmap(corrmat, vmax=.8, square=True)

sources = corrmat.columns.get_level_values(0)
for i, source in enumerate(sources):
    if i and source != sources[i - 1]:
        ax.axhline(len(sources) - i, c="w")
        ax.axvline(i, c="w")
f.tight_layout()


对评论的回应: 我更改了XYZ 中的t

X = pd.DataFrame("t":[1,2,3,4,5],"A":[34,12,78,84,26], "B":[54,87,35,25,82], "C":[56,78,0,14,13], "D":[0,23,72,56,14], "E":[78,12,31,0,34])
Y = pd.DataFrame("t":[6,7,8,9,10],"A":[45,24,65,65,65], "B":[45,87,65,52,12], "C":[98,52,32,32,12], "D":[0,23,1,365,53], "E":[24,12,65,3,65])
Z = pd.DataFrame("t":[11,12,13,14,15],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85])


catted = pd.concat([d.set_index('t') for d in [X, Y, Z]], axis=1, keys=['X', 'Y', 'Z'])
catted = catted.rename_axis(['Source', 'Column'], axis=1)

corrmat = catted.corr()

f, ax = plt.subplots()

sns.heatmap(corrmat, vmax=.8, square=True)

sources = corrmat.columns.get_level_values(0)
for i, source in enumerate(sources):
    if i and source != sources[i - 1]:
        ax.axhline(len(sources) - i, c="w")
        ax.axvline(i, c="w")
f.tight_layout()

现在又来了,不过我reset_index

X = pd.DataFrame("t":[1,2,3,4,5],"A":[34,12,78,84,26], "B":[54,87,35,25,82], "C":[56,78,0,14,13], "D":[0,23,72,56,14], "E":[78,12,31,0,34])
Y = pd.DataFrame("t":[6,7,8,9,10],"A":[45,24,65,65,65], "B":[45,87,65,52,12], "C":[98,52,32,32,12], "D":[0,23,1,365,53], "E":[24,12,65,3,65])
Z = pd.DataFrame("t":[11,12,13,14,15],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85])


catted = pd.concat([d.reset_index(drop=True) for d in [X, Y, Z]], axis=1, keys=['X', 'Y', 'Z'])
catted = catted.rename_axis(['Source', 'Column'], axis=1)

corrmat = catted.corr()

f, ax = plt.subplots()

sns.heatmap(corrmat, vmax=.8, square=True)

sources = corrmat.columns.get_level_values(0)
for i, source in enumerate(sources):
    if i and source != sources[i - 1]:
        ax.axhline(len(sources) - i, c="w")
        ax.axvline(i, c="w")
f.tight_layout()

【讨论】:

你知道为什么当我将相关矩阵应用于更大的数据时,它只显示对角线正方形吗?见图片:i.imgur.com/hLorwN2.png 我怀疑t 列未与XYZ 对齐 在玩了之后,我发现我在这些块中的相关性非常小,以至于看起来这些方块中什么都没有。我改变了我的规模,现在它是完美的。感谢@piRSquared 的帮助。

以上是关于具有匹配列的多个数据集的相关矩阵热图的主要内容,如果未能解决你的问题,请参考以下文章

在 pandas 中,如何在具有匹配行和列的 3 个单独数据帧之间建立相关矩阵?

我需要使用 python 在具有 144 列的数据集上创建 corr 矩阵

如何改进热图的大型数据集的 KQL 查询

R中向具有大量数据集的数据框添加新列的有效方法

R:具有 2 个大型数据集的模式匹配金融时间序列数据:

ChartJs - 具有多个数据集的圆环图上的圆形边框