如何使用 Plotly 制作一个只有一层的 Sankey 图？

Posted 2023-02-18

技术标签:

【中文标题】如何使用 Plotly 制作一个只有一层的 Sankey 图？【英文标题】：How do I make a Sankey diagram with Plotly with one layer that goes only one level? 【发布时间】：2022-01-16 23:47:20 【问题描述】：

我想制作一个分成不同级别的桑基图（显然），但是其中一个级别应该在一个之后停止，因为进一步的步骤不适用。很像这样：

import pandas as pd

pd.DataFrame(
    'kind': ['not an animal', 'animal', 'animal', 'animal', 'animal'],
    'animal': ['?', 'cat', 'cat', 'dog', 'cat'],
    'sex': ['?', 'female', 'female', 'male', 'male'],
    'status': ['?', 'domesticated', 'domesticated', 'wild', 'domesticated'],
    'count': [8, 10, 11, 14, 6]
)


    kind            animal  sex     status          count
0   not an animal   ?       ?       ?               8
1   animal          cat     female  domesticated    10
2   animal          cat     female  domesticated    11
3   animal          dog     male    wild            14
4   animal          cat     male    domesticated    6

“不是动物”不应再进一步拆分，因为它们不适用。它应该如下所示：

【问题讨论】：

【参考方案1】： 重用我在这个答案中使用的结构plotly sankey graph data formatting 将相关数据帧重构为：

	source	target	count
0	animal	cat	27
1	animal	dog	14
2	cat	female	21
3	cat	male	6
4	dog	male	14
5	female	domesticated	21
6	male	domesticated	6
7	male	wild	14
8	not an animal	?	8

然后它变成了构建节点和链接数组的情况

完整代码

import pandas as pd
import numpy as np
import plotly.graph_objects as go
import io

df2 = pd.read_csv(
    io.StringIO(
        """    kind            animal  sex     status          count
0   not an animal   ?       ?       ?               8
1   animal          cat     female  domesticated    10
2   animal          cat     female  domesticated    11
3   animal          dog     male    wild            14
4   animal          cat     male    domesticated    6"""
    ),
    sep="\s\s+",
    engine="python",
)

df = (
    pd.concat(
        [
            df2.loc[:, [c1, c2] + ["count"]].rename(
                columns=c1: "source", c2: "target"
            )
            for c1, c2 in zip(df2.columns[:-1], df2.columns[1:-1])
        ]
    )
    .loc[lambda d: ~d["source"].eq("?")]
    .groupby(["source", "target"], as_index=False)
    .sum()
)

nodes = np.unique(df[["source", "target"]], axis=None)
nodes = pd.Series(index=nodes, data=range(len(nodes)))

go.Figure(
    go.Sankey(
        node="label": nodes.index,
        link=
            "source": nodes.loc[df["source"]],
            "target": nodes.loc[df["target"]],
            "value": df["count"],
        ,
    )
)

分阶段构建数据框

col_pairs = [[c1, c2] for c1, c2 in zip(df2.columns[:-1], df2.columns[1:-1])]
# reconstruct as source / target pairs
df = pd.concat(
    [
        df2.loc[:, cols + ["count"]].rename(
            columns=cols[0]: "source", cols[1]: "target"
        )
        for cols in col_pairs
    ]
)

# filter out where source is unknown
df = df.loc[~df["source"].eq("?")]
# aggregate to limit links in sankey
df = df.groupby(["source", "target"], as_index=False).sum()

【讨论】：

针对我的具体情况，我已经尝试实现了几个小时，但它不起作用。然后我无法检查解决方案是否有效。您能否简化 lambda zip 循环部分，以便我可以逐步重现它？好的——这个概念。列对，第一个是源，第二个是目标。因此使用标准的python技术从列表中获取对，在同一个列表中使用zip，第二个迭代器是列表中的另一个项目。 lambda 函数仅用于过滤 source == "?" 的位置，不需要 group by 和 sum()，您只会在 sankey figure 中获得更多链接已通过有效的附录更新了答案，该附录是分阶段限制数据帧。您的环境中的 MWE 是否存在问题？

以上是关于如何使用 Plotly 制作一个只有一层的 Sankey 图？的主要内容，如果未能解决你的问题，请参考以下文章