Pandas:将 DataFrame 与嵌套数组结合或合并 JSON 输出

Posted

技术标签:

【中文标题】Pandas:将 DataFrame 与嵌套数组结合或合并 JSON 输出【英文标题】:Pandas: Combining DataFrames with nested arrays or merging the JSON output 【发布时间】:2019-01-11 13:11:13 【问题描述】:

我正在使用标准数据框并使用嵌套数组创建汇总数据的各种子集数据框。然后,我需要以某种方式组合子集数据帧,从而为我提供预期的 JSON 输出。 (我使用 MaxU 的答案来格式化我的大部分代码;Convert Pandas Dataframe to nested JSON)

我的标准数据框的前几行(如果需要,我可以给出这个例子中的所有 58 行):df

    ID         PRI_AFF   PRI_DEP      LOA    STATE
0   5571             M              Basic        A
1   5030             T  14700000     Blue        A
2   5030             T  14700000     Blue        A
3   5030             T  14700000     Blue        A
4   4014             T  14700000     Blue        A
5   2230             T  14700000      UFM        A
6   2230             T  14700000      UFM        A
7   2150             F  95011000   Bronze        A
8   2150             F  95011000   Bronze        A
9   2150             F  95011000   Bronze        A
10  2150             F  95011000   Bronze        A

从这里我使用以下 Python:

 PAFF_df = pd.DataFrame(df.groupby(['PRI_DEP','PRI_AFF'])['ID'].nunique().unstack().reset_index().fillna(0))
 LOA_df = pd.DataFrame(df.groupby(['PRI_DEP','LOA'])['ID'].nunique().unstack().reset_index().fillna(0))
 ST_df = pd.DataFrame(df.groupby(['PRI_DEP','STATE'])['ID'].nunique().unstack().reset_index().fillna(0))

 Nested_PAFF_df = (PAFF_df.groupby(['PRI_DEP'], as_index=True)
      .apply(lambda x: x[['A','E','F','L','M','T']].to_dict('r'))
      .reset_index()
      .rename(columns=0:'Primary_Affiliation'))

 Nested_LOA_df = (LOA_df.groupby(['PRI_DEP'], as_index=True)
      .apply(lambda x: x[['Basic','Blue','Bronze','Invalid','UFM']].to_dict('r'))
      .reset_index()
      .rename(columns=0:'LOA'))

 Nested_ST_df = (ST_df.groupby(['PRI_DEP'], as_index=True)
      .apply(lambda x: x[['A','E']].to_dict('r'))
      .reset_index()
      .rename(columns=0:'STATE'))

这给了我适当的嵌套 JSON 使用:.to_json(orient='records')

主要从属关系 JSON:

["PRI_DEP":" ","Primary_Affiliation":["A":0.0,"E":0.0,"F":0.0,"M":2.0,"L":0.0,"T":0.0],"PRI_DEP":"14700000","Primary_Affiliation":["A":0.0,"E":3.0,"F":0.0,"M":1.0,"L":1.0,"T":19.0],"PRI_DEP":"95011000","Primary_Affiliation":["A":0.0,"E":0.0,"F":1.0,"M":0.0,"L":0.0,"T":0.0],"PRI_DEP":"Null","Primary_Affiliation":["A":0.0,"E":1.0,"F":0.0,"M":0.0,"L":0.0,"T":0.0],"PRI_DEP":"ST010000","Primary_Affiliation":["A":1.0,"E":0.0,"F":0.0,"M":0.0,"L":0.0,"T":1.0]] 

LOA JSON:

["PRI_DEP":" ","LOA":["Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":1.0],"PRI_DEP":"14700000","LOA":["Blue":14.0,"UFM":5.0,"Invalid":1.0,"Bronze":4.0,"Basic":0.0],"PRI_DEP":"95011000","LOA":["Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":0.0],"PRI_DEP":"Null","LOA":["Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":0.0],"PRI_DEP":"ST010000","LOA":["Blue":0.0,"UFM":0.0,"Invalid":1.0,"Bronze":0.0,"Basic":1.0]] 

状态 JSON:

["PRI_DEP":" ","STATE":["A":2.0,"E":0.0],"PRI_DEP":"14700000","STATE":["A":23.0,"E":1.0],"PRI_DEP":"95011000","STATE":["A":1.0,"E":0.0],"PRI_DEP":"Null","STATE":["A":1.0,"E":0.0],"PRI_DEP":"ST010000","STATE":["A":2.0,"E":0.0]] 

现在我想通过 PRI_DEP 以某种方式将所有这些都表示在一个 JSON 中。

所以想要的 JSON 应该是这样的(为了便于阅读而更新):

["PRI_DEP":" ",
    "Primary_Affiliation":
        ["A":0.0,"E":0.0,"F":0.0,"M":2.0,"L":0.0,"T":0.0],
    "LOA": 
        ["Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":1.0],
    "STATE":
        ["A":2.0,"E":0.0],
 "PRI_DEP":"14700000",
    "Primary_Affiliation": 
        ["A":0.0,"E":3.0,"F":0.0,"M":1.0,"L":1.0,"T":19.0],
    "LOA": 
        ["Blue":14.0,"UFM":5.0,"Invalid":1.0,"Bronze":4.0,"Basic":0.0],
    "STATE":
        ["A":23.0,"E":1.0], 
 "PRI_DEP":"95011000",
    "Primary_Affiliation":
        ["A":0.0,"E":0.0,"F":1.0,"M":0.0,"L":0.0,"T":0.0],
    "LOA":
        ["Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":0.0],
    "STATE":
        ["A":1.0,"E":0.0],
 "PRI_DEP":"Null",
    "Primary_Affiliation": 
        ["A":0.0,"E":1.0,"F":0.0,"M":0.0,"L":0.0,"T":0.0],
    "LOA":
        ["Blue":0.0,"UFM":0.0,"Invalid":0.0,"Bronze":1.0,"Basic":0.0],
    "STATE":
        ["A":1.0,"E":0.0],
 "PRI_DEP":"ST010000",
    "Primary_Affiliation":
        ["A":1.0,"E":0.0,"F":0.0,"M":0.0,"L":0.0,"T":1.0],
    "LOA":
        ["Blue":0.0,"UFM":0.0,"Invalid":1.0,"Bronze":0.0,"Basic":1.0],
    "STATE":
        ["A":2.0,"E":0.0]]

【问题讨论】:

看起来你想要的 JSON 被截断了。可以更新吗? 我故意只放第一条记录,但我会用剩下的记录更新。只有几个。 【参考方案1】:

我一直在尝试不同的数据帧组合方式,我想我找到了答案。

在我的原始帖子中的 python 代码(设置嵌套组)之后,我做了以下操作:

Group_frames = [Nested_PAFF_df.set_index('PRI_DEP'), Nested_LOA_df.set_index('PRI_DEP'), Nested_ST_df.set_index('PRI_DEP')]
result = pd.concat(Group_frames, axis=1).reset_index()
print(result.to_json(orient='records'))

【讨论】:

以上是关于Pandas:将 DataFrame 与嵌套数组结合或合并 JSON 输出的主要内容,如果未能解决你的问题,请参考以下文章

将嵌套的 mongoDB 文档转换为平面 pandas DataFrame(对象数组中的对象数组)

将嵌套的 JSON 读入 Pandas DataFrame

pandas将Series转成DataFrame

pandas将Series转成DataFrame

如何将嵌套字典转换为 pandas DataFrame?

将嵌套的 dict 列表展平为 Pandas Dataframe