合并多索引数据框列表
Posted
技术标签:
【中文标题】合并多索引数据框列表【英文标题】:Merging a list of Multi-index dataframes 【发布时间】:2021-03-08 17:19:10 【问题描述】:我正在尝试将多个结果数据帧从我用 tqdm process_map 调用的函数合并在一起。每个 df 有 1 列、1 个索引和 3 个子索引。
cost_values =(process_map(run_simulation_a0_b0_search, param_list, max_workers=4))
以下是 dfs 的示例:
0.01
0.01 Collisions 0.0073125
Average distance 3.05586
Minimum distance 0.86763
10.0
0.01 Collisions 0
Average distance 0.423096
Minimum distance 0.332057
0.01
10.0 Collisions 0.00090625
Average distance 0.445388
Minimum distance 0.28061
10.0
10.0 Collisions 0
Average distance 0.418373
Minimum distance 0.29708
我尝试将它们连接起来,但这不起作用,所以我正在尝试合并它们
【问题讨论】:
【参考方案1】:从一方面来说,我建议您检查输出数据帧的格式。 它使处理变得混乱和缓慢。 根据 pandas 的经验,我总是使用平面数据集 - 1D 或 2D。
无论如何,这里有一个带有处理数据的最小示例的代码:
import pandas as pd
from tabulate import tabulate
def replicate_nested_df(df, a, b, columns):
# add nested index
df[''] = a
df = df.set_index([''] + [columns])
# add numeric named column
df = df.rename(columns=0: b)
return df
def flatten_nested_df(df):
# flatten and save simulation parameter a and b from nested structure
b = df.columns.values.tolist()[0]
df = df.reset_index()
a = df.iloc[0, :]['']
# rename and drop columns
df = df.rename(columns="level_1": "feature")
df = df.rename(columns=b: "values")
df = df[["feature", "values"]]
# transpose data
df = df.set_index(["feature"])
df = df.transpose().reset_index(drop=True)
df.rename_axis('', axis=1)
# add simulation parameters
df["a"] = a
df["b"] = b
return df
# create mockup dataframes
columns = ["Collisions", "Average distance", "Minimum distance"]
df1 = pd.DataFrame([[0.0073125, 3.05586, 0.86763]], columns=columns).transpose()
df1 = replicate_nested_df(df1, a=0.01, b=0.01, columns=columns)
df2 = pd.DataFrame([[0.003, 3.2, 0.8]], columns=columns).transpose()
df2 = replicate_nested_df(df2, a=0.01, b=10, columns=columns)
# process each dataframe
df_processed = []
for df_i in [df1, df2]:
df_processed.append(flatten_nested_df(df_i))
# create unique frame
df_concat = pd.concat(df_processed).reset_index(drop=True)
print("Mockup Input:")
print("df1:\n", df1)
print("df2:\n", df2)
print("Processed and merged dataset:")
print(tabulate(df_concat, headers=df_concat.columns, tablefmt='psql'))
输入:
输出:
+----+--------------+--------------------+--------------------+------+-------+
| | Collisions | Average distance | Minimum distance | a | b |
|----+--------------+--------------------+--------------------+------+-------|
| 0 | 0.0073125 | 3.05586 | 0.86763 | 0.01 | 0.01 |
| 1 | 0.003 | 3.2 | 0.8 | 0.01 | 10 |
+----+--------------+--------------------+--------------------+------+-------+
【讨论】:
以上是关于合并多索引数据框列表的主要内容,如果未能解决你的问题,请参考以下文章