Seaborn violinplots:如何获得小提琴边缘的线条路径?
Posted
技术标签:
【中文标题】Seaborn violinplots:如何获得小提琴边缘的线条路径?【英文标题】:Seaborn violinplots: how to I obtain line paths for violin edges? 【发布时间】:2022-01-16 12:58:47 【问题描述】:我正在使用 Seaborn 绘制两个小提琴图,我想获得小提琴边缘的 X 坐标,这样我可以从另一个中减去一个来找到两个 KDE/分布之间的差异。我怀疑它与 matplotlib.collections.PolyCollection 对象的属性有关,但我在浏览文档时遇到了困难 - 所以我很抱歉我没有太多代码要附加......但我包括我现在的小提琴以防万一
import pandas as pd
import seaborn as sns
observed_results = [11.10283128625, 6.031906445000001, 4.625099850384, 4.371541683749999, 4.188776438315, 4.169933187839999, 4.147271982216, 3.137545605726, 2.727390606468, 2.706991071933, 2.483074510875, 2.470624399684, 2.45608460474, 2.413902898276, 2.390530982763, 2.347653087613, 2.3049660823, 2.173520711313, 2.114398085409, 2.072213409552, 1.96126510972, 1.768724290017, 1.722913211104, 1.715972042575, 1.71293343376, 1.686909025847, 1.546933962564, 1.520621928225, 1.50428319008, 1.4944074417, 1.409957657136, 1.40292975245, 1.3157577856, 1.3078804375, 1.2974806016, 1.288403682732, 1.236493409437, 1.225900768752, 1.222094652926, 1.202655483344, 1.109818003441, 1.108678687017, 1.103788138352, 1.066656041116, 0.9799193812, 0.9729697610879998, 0.9532061536159999, 0.908066599737, 0.8847958337999999, 0.8698585519490001, 0.859714675264, 0.8422146736200001, 0.8229930509580001, 0.79569571686, 0.79170962662, 0.7855221024799999, 0.775488805524, 0.7690100510069999, 0.765153773568, 0.677797640336, 0.62878587992, 0.6278724034559998, 0.619961861949, 0.555740507912, 0.5458990340579999, 0.495263271752, 0.4744517001510001, 0.453783299787, 0.426952628919, 0.4079525625, 0.4030645275, 0.401266962432, 0.3807181439999999, 0.3313936548, 0.2707379496719999, 0.256952683998, 0.248430471184, 0.242090703552, 0.235644786034, 0.214602373804, 0.199521746592, 0.1951300125, 0.173116351962, 0.172562334309, 0.156660445172, 0.145336979139, 0.1190036898, 0.114659525376, 0.094888354288, 0.06696725615, 0.0305541639, 0.023464003706, 0.021922131125]
observed_labels = ["Observed" for n in observed_results]
expected_results = [4.5217885770490405, 3.828641396489095, 3.4231762883809305, 3.1354942159291497, 2.91235066461494, 2.7300291078209855, 2.575878427993727, 2.4423470353692043, 2.324563999712821, 2.2192034840549946, 2.12389330425067, 2.03688192726104, 1.9568392195875037, 1.8827312474337816, 1.8137383759468302, 1.749199854809259, 1.6885752329928243, 1.6314168191528757, 1.5773495978825998, 1.5260563034950494, 1.4772661393256175, 1.4307461236907244, 1.3862943611198906, 1.3437347467010947, 1.3029127521808397, 1.2636920390275583, 1.2259517110447113, 1.1895840668738362, 1.1544927470625663, 1.120591195386885, 1.087801372563894, 1.0560526742493137, 1.02528101558256, 0.995428052432879, 0.9664405155596267, 0.9382696385929304, 0.9108706644048158, 0.8842024173226546, 0.858226930919394, 0.832909122935104, 0.8082165103447325, 0.7841189587656721, 0.760588461355478, 0.7375989431307791, 0.7151260872787205, 0.6931471805599453, 0.6716409753389818, 0.6505875661411494, 0.6299682789384137, 0.6097655716208943, 0.5899629443247145, 0.570544858467613, 0.5514966634969185, 0.5328045304847661, 0.5144553918165694, 0.496436886313891, 0.4787373092144902, 0.46134556650262093, 0.44425113314332093, 0.4274440148269396, 0.4109147128757291, 0.39465419200394874, 0.37865385065750773, 0.3629054936893685, 0.34740130715340317, 0.3321338350226148, 0.31709595765807425, 0.3022808718729337, 0.2876820724517809, 0.2732933349996814, 0.2591087000077249, 0.2451224580329851, 0.2313291359006492, 0.21772348384487053, 0.20430046351272993, 0.19105523676270922, 0.17798315519535654, 0.16507975035944858, 0.1523407245820189, 0.13976194237515874, 0.1273394223766015, 0.11506932978478723, 0.10294796925244237, 0.09097177820572676, 0.07913732055872386, 0.06744128079553265, 0.055880458394456614, 0.04445176257083381, 0.03315220731690051, 0.02197890671877523, 0.010929070532190317, -0.0]
expected_labels = ["Expected" for n in expected_results]
all_results = observed_results + expected_results
all_labels = observed_labels + expected_labels
df_longform = r"Z-score": all_results, 'Condition': all_labels
df_longform = pd.DataFrame(data=df_longform)
ax = sns.violinplot(x='Condition', y=r"Z-score", data=df_longform, inner=None,
scale='area', bw=0.3, width=0.8, saturation=1,
linewidth=1, cut=0)
print(ax.collections)
plt.show()
如果有关此问题的任何内容不清楚,或者我忘记在此处提供任何其他内容,请告诉我。我这辈子都想不通,所以任何建议都非常感谢 -
【问题讨论】:
通过 scipy 的gaussian_kde
函数计算 kde 曲线可能会更容易,然后从那里寻找交点。参见例如Find non overlapping area between two kde plots in python。 violinplot
是主要的可视化工具,并不意味着精确计算。此外,您可能希望得到交叉点的近似值,而不是差值。
【参考方案1】:
以下是两者之间的比较:
calculating 并通过 kde 图及其交集可视化这两个分布 根据 kde 曲线模拟小提琴图 boxenplot,这可能是比较分布的另一种有价值的方式。import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
import seaborn as sns
import pandas as pd
observed_results = [11.10283128625, 6.031906445000001, 4.625099850384, 4.371541683749999, 4.188776438315, 4.169933187839999, 4.147271982216, 3.137545605726, 2.727390606468, 2.706991071933, 2.483074510875, 2.470624399684, 2.45608460474, 2.413902898276, 2.390530982763, 2.347653087613, 2.3049660823, 2.173520711313, 2.114398085409, 2.072213409552, 1.96126510972, 1.768724290017, 1.722913211104, 1.715972042575, 1.71293343376, 1.686909025847, 1.546933962564, 1.520621928225, 1.50428319008, 1.4944074417, 1.409957657136, 1.40292975245, 1.3157577856, 1.3078804375, 1.2974806016, 1.288403682732, 1.236493409437, 1.225900768752, 1.222094652926, 1.202655483344, 1.109818003441, 1.108678687017, 1.103788138352, 1.066656041116, 0.9799193812, 0.9729697610879998, 0.9532061536159999, 0.908066599737, 0.8847958337999999, 0.8698585519490001, 0.859714675264, 0.8422146736200001, 0.8229930509580001, 0.79569571686, 0.79170962662, 0.7855221024799999, 0.775488805524, 0.7690100510069999, 0.765153773568, 0.677797640336, 0.62878587992, 0.6278724034559998, 0.619961861949, 0.555740507912, 0.5458990340579999, 0.495263271752, 0.4744517001510001, 0.453783299787, 0.426952628919, 0.4079525625, 0.4030645275, 0.401266962432, 0.3807181439999999, 0.3313936548, 0.2707379496719999, 0.256952683998, 0.248430471184, 0.242090703552, 0.235644786034, 0.214602373804, 0.199521746592, 0.1951300125, 0.173116351962, 0.172562334309, 0.156660445172, 0.145336979139, 0.1190036898, 0.114659525376, 0.094888354288, 0.06696725615, 0.0305541639, 0.023464003706, 0.021922131125]
expected_results = [4.5217885770490405, 3.828641396489095, 3.4231762883809305, 3.1354942159291497, 2.91235066461494, 2.7300291078209855, 2.575878427993727, 2.4423470353692043, 2.324563999712821, 2.2192034840549946, 2.12389330425067, 2.03688192726104, 1.9568392195875037, 1.8827312474337816, 1.8137383759468302, 1.749199854809259, 1.6885752329928243, 1.6314168191528757, 1.5773495978825998, 1.5260563034950494, 1.4772661393256175, 1.4307461236907244, 1.3862943611198906, 1.3437347467010947, 1.3029127521808397, 1.2636920390275583, 1.2259517110447113, 1.1895840668738362, 1.1544927470625663, 1.120591195386885, 1.087801372563894, 1.0560526742493137, 1.02528101558256, 0.995428052432879, 0.9664405155596267, 0.9382696385929304, 0.9108706644048158, 0.8842024173226546, 0.858226930919394, 0.832909122935104, 0.8082165103447325, 0.7841189587656721, 0.760588461355478, 0.7375989431307791, 0.7151260872787205, 0.6931471805599453, 0.6716409753389818, 0.6505875661411494, 0.6299682789384137, 0.6097655716208943, 0.5899629443247145, 0.570544858467613, 0.5514966634969185, 0.5328045304847661, 0.5144553918165694, 0.496436886313891, 0.4787373092144902, 0.46134556650262093, 0.44425113314332093, 0.4274440148269396, 0.4109147128757291, 0.39465419200394874, 0.37865385065750773, 0.3629054936893685, 0.34740130715340317, 0.3321338350226148, 0.31709595765807425, 0.3022808718729337, 0.2876820724517809, 0.2732933349996814, 0.2591087000077249, 0.2451224580329851, 0.2313291359006492, 0.21772348384487053, 0.20430046351272993, 0.19105523676270922, 0.17798315519535654, 0.16507975035944858, 0.1523407245820189, 0.13976194237515874, 0.1273394223766015, 0.11506932978478723, 0.10294796925244237, 0.09097177820572676, 0.07913732055872386, 0.06744128079553265, 0.055880458394456614, 0.04445176257083381, 0.03315220731690051, 0.02197890671877523, 0.010929070532190317, -0.0]
x0 = observed_results
x1 = expected_results
kde0 = gaussian_kde(x0, bw_method=0.3)
kde1 = gaussian_kde(x1, bw_method=0.3)
xmin = min(min(x0), min(x1))
xmax = max(max(x0), max(x1))
dx = 0.2 * (xmax - xmin) # add a 20% margin, as the kde is wider than the data
xmin -= dx
xmax += dx
x = np.linspace(xmin, xmax, 500)
kde0_x = kde0(x)
kde1_x = kde1(x)
inters_x = np.minimum(kde0_x, kde1_x)
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(15, 7))
ax1.plot(x, kde0_x, color='b', label='Observed')
ax1.fill_between(x, kde0_x, 0, color='b', alpha=0.2)
ax1.plot(x, kde1_x, color='orange', label='Expected')
ax1.fill_between(x, kde1_x, 0, color='orange', alpha=0.2)
ax1.plot(x, inters_x, color='r')
ax1.fill_between(x, inters_x, 0, facecolor='none', edgecolor='r', hatch='xx', label='intersection')
area_inters_x = np.trapz(inters_x, x)
handles, labels = ax1.get_legend_handles_labels()
labels[2] += f': area_inters_x * 100:.1f %'
ax1.legend(handles, labels)
ax1.set_ylim(ymin=0)
ax1.set_title("kde plot with intersection")
ax2.plot(kde0_x/2, x, color='b', label='Observed')
ax2.plot(-kde0_x/2, x, color='b')
ax2.fill_betweenx(x, -kde0_x/2, kde0_x/2, color='b', alpha=0.2)
ax2.plot(kde1_x/2, x, color='r', label='Expected')
ax2.plot(-kde1_x/2, x, color='r')
ax2.fill_betweenx(x, -kde1_x/2, kde1_x/2, color='r', alpha=0.2)
ax2.plot(inters_x/2, x, color='k')
ax2.plot(-inters_x/2, x, color='k')
ax2.fill_betweenx(x, -inters_x/2, inters_x/2, facecolor='none', edgecolor='k', hatch='oo', label='intersection')
handles, labels = ax2.get_legend_handles_labels()
labels[2] += f': area_inters_x * 100:.1f %'
ax2.legend(handles, labels)
ax2.set_title("simulated violinplot with intersection")
df_longform = pd.DataFrame(data= "Z-score": observed_results + expected_results,
"Condition": ["Observed"] * len(observed_results)+["Expected"] * len(expected_results) )
sns.boxenplot(x="Condition", y="Z-score", data=df_longform, ax=ax3)
ax3.set_title("sns.boxenplot")
plt.tight_layout()
plt.show()
【讨论】:
谢谢!这正是我所要求的 =) 不知道小提琴边缘与 gaussian_kde 函数的计算完全相同。以上是关于Seaborn violinplots:如何获得小提琴边缘的线条路径?的主要内容,如果未能解决你的问题,请参考以下文章
seaborn使用violinplot函数可视化分组小提琴图(Grouped Violinplot with Seaborn violinplot)并保存可视化结果
seaborn使用violinplot函数可视化水平小提琴图(Make Horizontal Violin Plot with violinplot in Seaborn)
seaborn使用violinplot函数可视化小提琴图并在violinplot函数中设置inner参数来添加数据点显示数据的稠密程度
seaborn violinplot 和 boxplot 并排
seaborn使用Catplot函数可视化分组小提琴图( Grouped Violinplot with Seaborn Catplot)并保存可视化结果