如何通过 bin Multiindex 获取最接近的列值？

Posted 2023-03-11

技术标签:

【中文标题】如何通过 bin Multiindex 获取最接近的列值？【英文标题】：How to get the closest the column value by the bin Multiindex? 【发布时间】：2022-01-23 19:33:20 【问题描述】：

我有一个带有 Multiindex 的 DataFrame (df_value_bin)，它是分箱值，如下所示：

import pandas as pd
import numpy as np

np.random.seed(100)
df = pd.DataFrame(np.random.randn(100, 3), columns=['a', 'b', 'value'])

a_bins = np.arange(-3, 4, 1)
b_bins = np.arange(-2, 4, 2)

df['a_bins'] = pd.cut(df['a'], bins=a_bins)
df['b_bins'] = pd.cut(df['b'], bins=b_bins)
df_value_bin = df.groupby(['a_bins','b_bins']).agg('value':'mean')

这是df_value_bin的快速浏览：

                     value
a_bins   b_bins           
(-3, -2] (-2, 0] -0.417606
         (0, 2]  -0.267035
(-2, -1] (-2, 0] -0.296727
         (0, 2]  -0.112280
(-1, 0]  (-2, 0]  0.459780
         (0, 2]   0.131588
(0, 1]   (-2, 0]  0.110268
         (0, 2]   0.287755
(1, 2]   (-2, 0]  0.254337
         (0, 2]  -0.627460
(2, 3]   (-2, 0] -0.075165
         (0, 2]  -0.589709

然后，我想在给出一些a 和b 时得到最接近df_value_bin 的value。

假设 a=1.5 和 b=-1，那么我们应该得到 value=0.254337。

尝试 1

我可以为a_bins 和b_bins 生成布尔掩码：

a_test = 1.5
b_test = -1

boolean_a = df_value_bin.index.get_level_values('a_bins').categories.contains(a_test)
boolean_b = df_value_bin.index.get_level_values('b_bins').categories.contains(b_test)

print(boolean_a, boolean_b) # Output: [False False False False  True False] [ True False]

但是，我不知道使用掩码来选择行...

尝试 2

我可以直接获取索引：

index_a = np.digitize(a_test, a_bins, right=True)
index_b = np.digitize(b_test, b_bins, right=True)

print(index_a, index_b) # Output: 5 1

再次，我不知道如何使用索引直接选择行。

注意事项

看来第二种方法应该更快，因为它使用np.digitize()。如果您有什么想法来完成它或其他更好的方法，请随时回答！

【问题讨论】：

【参考方案1】：

在这种情况下，您可以只使用数字进行索引：

df_value_bin.loc[(1.5, -1)]

输出（忽略值，随机生成，看Name）：

value    0.047439
Name: ((1, 2], (-2, 0]), dtype: float64

【讨论】：

感谢您的出色方法！如果有人需要数组的输入，方法如下：test = np.array([-1.5, 0, 1]); df_value_bin.loc[zip(test, test)] 如果用户更喜欢使用标签，那么df_value_bin.xs((1.5, -1), level=['a_bins', 'b_bins'])满足它。

以上是关于如何通过 bin Multiindex 获取最接近的列值？的主要内容，如果未能解决你的问题，请参考以下文章