只有当四肢具有相同的值并且限制为最多出现时,如何填补数据空白?

Posted

技术标签:

【中文标题】只有当四肢具有相同的值并且限制为最多出现时,如何填补数据空白?【英文标题】:How to fill data gaps only when extremities have the same value, and limited to a maximum of occurrences? 【发布时间】:2021-07-31 01:27:43 【问题描述】:

我在这里搜索了很多可以解决此问题但找不到的答案。期望的结果是仅在四肢相等时填充间隙,限制为 4 个值的长度:

我的数据集:

0     NaN
1     NaN
2     NaN
3     5.0
4     5.0
5     NaN
6     NaN
7     5.0
8     6.0
9     NaN
10    NaN
11    NaN
12    NaN
13    NaN
14    NaN
15    5.0
16    5.0
17    NaN
18    NaN
19    6.0
20    6.0
21    NaN
22    NaN
23    NaN
24    NaN
25    5.0
26    NaN
27    NaN
28    NaN
29    NaN
30    NaN
31    NaN
32    NaN
33    5.0
34    NaN
35    NaN

期望的结果(仅在四肢相等时填充间隙,限制长度为 4 的间隙):

0     NaN   # Not filled since the gap ends with 5 but this is the dataset beginning (don't know how it starts)
1     NaN   # Not filled since the gap ends with 5 but this is the dataset beginning (don't know how it starts)
2     NaN   # Not filled since the gap ends with 5 but this is the dataset beginning (don't know how it starts)
3     5.0  # Original dataset
4     5.0  # Original dataset
5     5.0    # Filled since the gap starts with 5 and ends with 5 (and is smaller than 4 values)
6     5.0    # Filled since the gap starts with 5 and ends with 5 (and is smaller than 4 values)
7     5.0  # Original dataset
8     6.0  # Original dataset
9     NaN    # Not filled since the gap starts with 6 and ends with 5
10    NaN         .
11    NaN         .
12    NaN         .
13    NaN         .
14    NaN    # Not filled since the gap starts with 6 and ends with 5
15    5.0  # Original dataset
16    5.0  # Original dataset
17    NaN    # Not filled since the gap starts with 5 and ends with 6
18    NaN    # Not filled since the gap starts with 5 and ends with 6
19    6.0  # Original dataset
20    6.0  # Original dataset
21    NaN    # Not filled since the gap starts with 6 and ends with 5
22    NaN         .
23    NaN         .
24    NaN    # Not filled since the gap starts with 6 and ends with 5
25    5.0  # Original dataset
26    5.0    # Filled since the gap starts with 5 and ends with 5
27    5.0    # Filled since the gap starts with 5 and ends with 5
28    5.0    # Filled since the gap starts with 5 and ends with 5
29    5.0    # Filled since the gap starts with 5 and ends with 5
30    NaN    # Not filled since maximum gap is 4
31    NaN    # Not filled since maximum gap is 4
32    NaN    # Not filled since maximum gap is 4
33    5.0  # Original dataset
34    NaN    # Not filled since the gap starts with 5 but this is the dataset end (don't know how it ends)
35    NaN    # Not filled since the gap starts with 5 but this is the dataset end (don't know how it ends)

【问题讨论】:

【参考方案1】:

应该是这样的:

def extremities(arr):
nones = [i for i,x in enumerate(arr) if x == None]
not_nones = [i for i,x in enumerate(arr) if x != None]
for i in nones:
    try:
        start = [x for x in not_nones if x < i][-1]
        finish = [x for x in not_nones if x > i][0]
    except:
        continue
    if arr[start] == arr[finish] and i - start < 5:
        arr[i] = arr[start]
return arr

已编辑:

抱歉,我忘记了它的长度限制为 4 个值。我编辑了代码。

【讨论】:

【参考方案2】:

我们可以使用布尔掩码和cumsum 来识别以相同值开始和结束的NaN 值的块,然后将这些块上的列分组并向前填充4 的限制

s = df['col']
m = s.notna()
s.mask(s[m] != s[m].shift(-1)).groupby(m.cumsum()).ffill(limit=4).fillna(s)

0     NaN
1     NaN
2     NaN
3     5.0
4     5.0
5     5.0
6     5.0
7     5.0
8     6.0
9     NaN
10    NaN
11    NaN
12    NaN
13    NaN
14    NaN
15    5.0
16    5.0
17    NaN
18    NaN
19    6.0
20    6.0
21    NaN
22    NaN
23    NaN
24    NaN
25    5.0
26    5.0
27    5.0
28    5.0
29    5.0
30    NaN
31    NaN
32    NaN
33    5.0
34    NaN
35    NaN
Name: col, dtype: float64

【讨论】:

这太美了!!!简单、快速、有效! “s.mask(s[m] != s[m].shift(-1))”的想法真的把这个问题变成了一个简单的解决方案。你是怎么想到这个主意的?? :) @User365Go 很高兴我能帮上忙。关于这个想法,它提出了很多经验和解决问题:P

以上是关于只有当四肢具有相同的值并且限制为最多出现时,如何填补数据空白?的主要内容,如果未能解决你的问题,请参考以下文章

当必须根据条件对记录进行分组时如何选择最多 x 行

在jQuery中,当它们都具有相同的名称时,如何获取单选按钮的值?

算法思考

TextField 接受有限的字符 SwiftUi

当参数具有相同名称时如何恢复内置函数?

按具有相同值的值排序时定义的 SQL 行为