分箱数据和包含结果
Posted
技术标签:
【中文标题】分箱数据和包含结果【英文标题】:binning data and inclusive result 【发布时间】:2012-05-15 09:43:05 【问题描述】:假设我已经在这样的结构中分箱了一些数据:
data = (1,1): [...] # list of float,
(1,2): [...],
(1,3): [...],
(2,1): [...],
...
这里我只有两个轴用于分箱,但假设我有 N 个。现在假设例如我有 N=3 轴,我想要第二个 bin 为 1 的数据,所以我想要一个函数
(None, 1, None) -> [(1, 1, 1), (1, 1, 2), (1, 1, 3), ...
(2, 1, 1), (2, 1, 2), (2, 1, 3), ...]
所以我可以使用itertools.chain
作为结果
你知道每个轴的范围来自:
axes_ranges = [(1, 10), (1, 8), (1, 3)]
其他例子:
(None, 1, 2) -> [(1, 1, 2), (2, 1, 2), (3, 1, 2), ...]
(None, None, None) -> all the combinations
(1,2,3) -> [(1,2,3)]
【问题讨论】:
【参考方案1】:看起来很像你重新发明***。您可能想要使用的是 numpy.ndarray:
import numpy as np
>>> x = np.arange(0,27)
>>> x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26])
>>> x.reshape(3,3,3)
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
>>> x[0]
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> x[:,1,:]
array([[ 3, 4, 5],
[12, 13, 14],
[21, 22, 23]])
>>> x[:,1,1]
array([ 4, 13, 22])
这可以有 N 个维度。在示例中,索引是三维的,您可以将其视为具有 x[a,b,c] = x[layer,row,column] 的立方体。使用“:”作为索引仅表示“全部”
【讨论】:
这很好,现在的问题是2:1.如何将(None, 1, 1)
翻译成x[:, 1, 1]? :
是哪种符号? 2.我的数据不是int(或float):对于每个bin,我都有一个float集合(一个列表)
浮点数的列表是否等长?【参考方案2】:
嗯,怎么样:
import itertools
def combinations_with_fixpoint(iterables, *args):
return itertools.product(*([x] if x else y for x, y in zip(args, iterables)))
axes_ranges = [(1, 7), (1, 8), (77, 79)]
combs = combinations_with_fixpoint(
itertools.starmap(range, axes_ranges),
None, 5, None
)
for p in combs:
print p
# (1, 5, 77)
# (1, 5, 78)
# (2, 5, 77)
# (2, 5, 78)
# (3, 5, 77)
# (3, 5, 78)
# (4, 5, 77)
# (4, 5, 78)
# (5, 5, 77)
# (5, 5, 78)
# (6, 5, 77)
# (6, 5, 78)
也许只是传递一个列表以允许多个“固定点”:
def combinations_with_fixpoint(iterables, *args):
return itertools.product(*(x or y for x, y in zip(args, iterables)))
combs = combinations_with_fixpoint(
itertools.starmap(range, axes_ranges),
None, [5, 6], None
)
【讨论】:
【参考方案3】:binning = [[0, 0.1, 0.2], [0, 10, 20], [-1, -2, -3]]
range_binning = [(1, len(x) + 1) for x in binning]
def expand_bin(thebin):
def expand_bin_index(thebin, freeindex, rangebin):
"""
thebin = [1, None, 3]
freeindex = 1
rangebin = [4,5]
-> [[1, 4, 3], [1, 5, 3]]
"""
result = []
for r in rangebin:
newbin = thebin[:]
newbin[freeindex] = r
result.append(newbin)
return result
tmp = [thebin]
indexes_free = [i for i,aa in enumerate(thebin) if aa is None]
for index_free in indexes_free:
range_index = range(*(range_binning[index_free]))
new_tmp = []
for t in tmp:
for expanded in expand_bin_index(t, index_free, range_index):
new_tmp.append(expanded)
tmp = new_tmp
return tmp
inputs = ([None, 1, 2], [None, None, 3], [None, 1, None], [3, 2, 1], [None, None, None])
for i in inputs:
print "%s-> %s" % (i, expand_bin(i))
【讨论】:
以上是关于分箱数据和包含结果的主要内容,如果未能解决你的问题,请参考以下文章