如何向 matplotlib 注释添加附加文本
Posted
技术标签:
【中文标题】如何向 matplotlib 注释添加附加文本【英文标题】:How to add additional text to matplotlib annotations 【发布时间】:2021-08-01 08:53:11 【问题描述】:我使用 seaborn 的 titanic 数据集作为我非常大的数据集的代理来创建基于它的图表和数据。
以下代码运行没有任何错误:
import seaborn as sns
import pandas as pd
import numpy as np
sns.set_theme(style="darkgrid")
# Load the example Titanic dataset
df = sns.load_dataset("titanic")
# split fare into decile groups and order them
df['fare_grp'] = pd.qcut(df['fare'], q=10,labels=None, retbins=False, precision=0).astype(str)
df.groupby(['fare_grp'],dropna=False).size()
df['fare_grp_num'] = pd.qcut(df['fare'], q=10,labels=False, retbins=False, precision=0).astype(str)
df.groupby(['fare_grp_num'],dropna=False).size()
df['fare_ord_grp'] = df['fare_grp_num'] + ' ' +df['fare_grp']
df['fare_ord_grp']
# set variables
target = 'survived'
ydim = 'fare_ord_grp'
xdim = 'embark_town'
#del [result]
non_events = pd.DataFrame(df[df[target]==0].groupby([ydim,xdim],as_index=False, dropna=False)[target].count()).rename(columns=target: 'non_events')
non_events[xdim]=non_events[xdim].replace(np.nan, 'Missing', regex=True)
non_events[ydim]=non_events[ydim].replace(np.nan, 'Missing', regex=True)
non_events_total = pd.DataFrame(df[df[target]==0].groupby([xdim],dropna=False,as_index=False)[target].count()).rename(columns=target: 'non_events_total_by_xdim').replace(np.nan, 'Missing', regex=True)
events = pd.DataFrame(df[df[target]==1].groupby([ydim,xdim],as_index=False, dropna=False)[target].count()).rename(columns=target: 'events')
events[xdim]=events[xdim].replace(np.nan, 'Missing', regex=True)
events[ydim]=events[ydim].replace(np.nan, 'Missing', regex=True)
events_total = pd.DataFrame(df[df[target]==1].groupby([xdim],dropna=False,as_index=False)[target].count()).rename(columns=target: 'events_total_by_xdim').replace(np.nan, 'Missing', regex=True)
grand_total = pd.DataFrame(df.groupby([xdim],dropna=False,as_index=False)[target].count()).rename(columns=target: 'total_by_xdim').replace(np.nan, 'Missing', regex=True)
grand_total=grand_total.merge(non_events_total, how='left', on=xdim).merge(events_total, how='left', on=xdim)
result = pd.merge(non_events, events, how="outer",on=[ydim,xdim])
result['total'] = result['non_events'].fillna(0) + result['events'].fillna(0)
result[xdim] = result[xdim].replace(np.nan, 'Missing', regex=True)
result = pd.merge(result, grand_total, how="left",on=[xdim])
result['survival rate %'] = round(result['events']/result['total']*100,2)
result['% event dist by xdim'] = round(result['events']/result['events_total_by_xdim']*100,2)
result['% non-event dist by xdim'] = round(result['non_events']/result['non_events_total_by_xdim']*100,2)
result['% total dist by xdim'] = round(result['total']/result['total_by_xdim']*100,2)
display(result)
value_name1 = "% dist by " + str(xdim)
dfl = pd.melt(result, id_vars=[ydim, xdim],value_vars =['% total dist by xdim'], var_name = 'Type',value_name=value_name1).drop(columns='Type')
dfl2 = dfl.pivot(index=ydim, columns=xdim, values=value_name1)
print(dfl2)
title1 = "% dist by " + str(xdim)
ax=dfl2.T.plot(kind='bar', stacked=True, rot=1, figsize=(8, 8), title=title1)
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
ax.legend(bbox_to_anchor=(1.0, 1.0),title = 'Fare Range')
ax.set_ylabel('% Dist')
for p in ax.patches:
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
ax.text(x+width/2, y+height/2,':.0f%'.format(height),horizontalalignment='center', verticalalignment='center')
它会生成以下堆积百分比条形图,其中显示了各城镇的总分布百分比。
我还想显示存活率以及每个块中的分布百分比。例如,对于皇后镇,票价范围 1 (7.6, 7.9],总分布百分比为 56%。我想将 37.21% 的存活率显示为 (56%, 37.21%)。我无法弄清楚。请提供任何建议。谢谢。
这里是输出汇总表供参考
fare_ord_grp | embark_town | non_events | events | total | total_by_xdim | non_events_total_by_xdim | events_total_by_xdim | survival rate % | % event dist by xdim | % non-event dist by xdim | % total dist by xdim | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 (-0.1,7.6] | Cherbourg | 22 | 7 | 29 | 168 | 75 | 93 | 24.14 | 7.53 | 29.33 | 17.26 |
1 | 0 (-0.1,7.6] | Queenstown | 4 | NaN | 4 | 77 | 47 | 30 | NaN | NaN | 8.51 | 5.19 |
2 | 0 (-0.1,7.6] | Southampton | 53 | 6 | 59 | 644 | 427 | 217 | 10.17 | 2.76 | 12.41 | 9.16 |
3 | 1 (7.6,7.9] | Queenstown | 27 | 16 | 43 | 77 | 47 | 30 | 37.21 | 53.33 | 57.45 | 55.84 |
4 | 1 (7.6,7.9] | Southampton | 34 | 10 | 44 | 644 | 427 | 217 | 22.73 | 4.61 | 7.96 | 6.83 |
5 | 2 (7.9,8] | Cherbourg | 4 | 1 | 5 | 168 | 75 | 93 | 20 | 1.08 | 5.33 | 2.98 |
6 | 2 (7.9,8] | Southampton | 83 | 13 | 96 | 644 | 427 | 217 | 13.54 | 5.99 | 19.44 | 14.91 |
7 | 3 (8.0,10.5] | Cherbourg | 2 | 1 | 3 | 168 | 75 | 93 | 33.33 | 1.08 | 2.67 | 1.79 |
8 | 3 (8.0,10.5] | Queenstown | 2 | NaN | 2 | 77 | 47 | 30 | NaN | NaN | 4.26 | 2.6 |
9 | 3 (8.0,10.5] | Southampton | 56 | 17 | 73 | 644 | 427 | 217 | 23.29 | 7.83 | 13.11 | 11.34 |
10 | 4 (10.5,14.5] | Cherbourg | 7 | 8 | 15 | 168 | 75 | 93 | 53.33 | 8.6 | 9.33 | 8.93 |
11 | 4 (10.5,14.5] | Queenstown | 1 | 2 | 3 | 77 | 47 | 30 | 66.67 | 6.67 | 2.13 | 3.9 |
12 | 4 (10.5,14.5] | Southampton | 40 | 26 | 66 | 644 | 427 | 217 | 39.39 | 11.98 | 9.37 | 10.25 |
13 | 5 (14.5,21.7] | Cherbourg | 9 | 10 | 19 | 168 | 75 | 93 | 52.63 | 10.75 | 12 | 11.31 |
14 | 5 (14.5,21.7] | Queenstown | 5 | 3 | 8 | 77 | 47 | 30 | 37.5 | 10 | 10.64 | 10.39 |
15 | 5 (14.5,21.7] | Southampton | 37 | 24 | 61 | 644 | 427 | 217 | 39.34 | 11.06 | 8.67 | 9.47 |
16 | 6 (21.7,27] | Cherbourg | 1 | 4 | 5 | 168 | 75 | 93 | 80 | 4.3 | 1.33 | 2.98 |
17 | 6 (21.7,27] | Queenstown | 2 | 3 | 5 | 77 | 47 | 30 | 60 | 10 | 4.26 | 6.49 |
18 | 6 (21.7,27] | Southampton | 40 | 39 | 79 | 644 | 427 | 217 | 49.37 | 17.97 | 9.37 | 12.27 |
19 | 7 (27.0,39.7] | Cherbourg | 14 | 10 | 24 | 168 | 75 | 93 | 41.67 | 10.75 | 18.67 | 14.29 |
20 | 7 (27.0,39.7] | Queenstown | 5 | NaN | 5 | 77 | 47 | 30 | NaN | NaN | 10.64 | 6.49 |
21 | 7 (27.0,39.7] | Southampton | 38 | 24 | 62 | 644 | 427 | 217 | 38.71 | 11.06 | 8.9 | 9.63 |
22 | 8 (39.7,78] | Cherbourg | 5 | 19 | 24 | 168 | 75 | 93 | 79.17 | 20.43 | 6.67 | 14.29 |
23 | 8 (39.7,78] | Southampton | 37 | 28 | 65 | 644 | 427 | 217 | 43.08 | 12.9 | 8.67 | 10.09 |
24 | 9 (78.0,512.3] | Cherbourg | 11 | 33 | 44 | 168 | 75 | 93 | 75 | 35.48 | 14.67 | 26.19 |
25 | 9 (78.0,512.3] | Queenstown | 1 | 1 | 2 | 77 | 47 | 30 | 50 | 3.33 | 2.13 | 2.6 |
26 | 9 (78.0,512.3] | Southampton | 9 | 30 | 39 | 644 | 427 | 217 | 76.92 | 13.82 | 2.11 | 6.06 |
27 | 2 (7.9,8] | Queenstown | NaN | 5 | 5 | 77 | 47 | 30 | 100 | 16.67 | NaN | 6.49 |
28 | 9 (78.0,512.3] | Missing | NaN | 2 | 2 | 2 | NaN | 2 | 100 | 100 | NaN | 100 |
【问题讨论】:
【参考方案1】:dfl2.T
正在绘制中,但 'survival rate %'
在 result
中。因此,来自dfl2.T
的值的索引与'survival rate %'
不对应。
因为result['% total dist by xdim']
中的所有值都是
不是唯一的,我们不能使用匹配的key-values
中的dict
。
为'survival rate %'
创建一个对应的pivot DataFrame,然后将其展平。所有值的顺序都与来自dfl2.T
的'% total dist by xdim'
值的顺序相同。因此,它们可以被编入索引。
相对于dfl2.T
,plot API 按列顺序绘制,这意味着必须使用.flatten(order='F')
以正确的顺序展平数组以被索引。
# create a corresponding pivoted dataframe for survival rate %
dfl3 = pd.melt(result, id_vars=[ydim, xdim],value_vars =['survival rate %'], var_name = 'Type',value_name=value_name1).drop(columns='Type')
dfl4 = dfl3.pivot(index=ydim, columns=xdim, values=value_name1)
# flatten dfl4.T in column order
dfl4_flattened = dfl4.T.to_numpy().flatten(order='F')
for i, p in enumerate(ax.patches):
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
# only print values when height is not 0
if height != 0:
# create the text string
text = f'height:.0f%, dfl4_flattened[i]:.0f%'
# annotate the bar segments
ax.text(x+width/2, y+height/2, text, horizontalalignment='center', verticalalignment='center')
注意事项
在这里我们可以看到dfl2.T
和dfl4.T
# dfl2.T
fare_ord_grp 0 (-0.1, 7.6] 1 (7.6, 7.9] 2 (7.9, 8.0] 3 (8.0, 10.5] 4 (10.5, 14.5] 5 (14.5, 21.7] 6 (21.7, 27.0] 7 (27.0, 39.7] 8 (39.7, 78.0] 9 (78.0, 512.3]
embark_town
Cherbourg 17.26 NaN 2.98 1.79 8.93 11.31 2.98 14.29 14.29 26.19
Missing NaN NaN NaN NaN NaN NaN NaN NaN NaN 100.00
Queenstown 5.19 55.84 6.49 2.60 3.90 10.39 6.49 6.49 NaN 2.60
Southampton 9.16 6.83 14.91 11.34 10.25 9.47 12.27 9.63 10.09 6.06
# dfl4.T
fare_ord_grp 0 (-0.1, 7.6] 1 (7.6, 7.9] 2 (7.9, 8.0] 3 (8.0, 10.5] 4 (10.5, 14.5] 5 (14.5, 21.7] 6 (21.7, 27.0] 7 (27.0, 39.7] 8 (39.7, 78.0] 9 (78.0, 512.3]
embark_town
Cherbourg 24.14 NaN 20.00 33.33 53.33 52.63 80.00 41.67 79.17 75.00
Missing NaN NaN NaN NaN NaN NaN NaN NaN NaN 100.00
Queenstown NaN 37.21 100.00 NaN 66.67 37.50 60.00 NaN NaN 50.00
Southampton 10.17 22.73 13.54 23.29 39.39 39.34 49.37 38.71 43.08 76.92
【讨论】:
以上是关于如何向 matplotlib 注释添加附加文本的主要内容,如果未能解决你的问题,请参考以下文章
python使用matplotlib可视化使用annotate函数以及arrowprops参数在可视化图像中添加箭头和文本注释(arrow and text annotation)