计算数组中前面重复项的数量

Posted

技术标签:

【中文标题】计算数组中前面重复项的数量【英文标题】:Count number of preceding repeating items in an array 【发布时间】:2021-03-10 17:10:48 【问题描述】:

我有以下查询,它遍历数组中的每个项目,然后回头查看之前有多少重复的 f,包括它自己。

它可以工作,但是它在大量行中会很慢 - 有没有更简洁的方法来处理数组中的序列?

SELECT
['p','p','f','f','f','f','p','f', 'f', 'f'] AS sequence,
arrayMap( (x,y) -> (x, 
   if (x='f', (arrayFirstIndex( k -> k=0,
       arrayCumSumNonNegative((n, index) -> n = 'f' ? 1 : -index,
       arrayReverse(arraySlice(sequence,1,y)) as arr,
       arrayEnumerate(arr)))
   )-1, 0)), sequence, arrayEnumerate(sequence))

result:

[('p',0),('p',0),('f',1),('f',2),('f',3),('f',4),('p',0),('f',1),('f',2),('f',3)]

提前致谢

【问题讨论】:

arrayDifference + arraySplit 应该可以解决它。例如 ***.com/a/61617086/11644308 或者简单的 arraySplit(i -> i = 'p', sequence) 再次为丹尼干杯 【参考方案1】:

试试这个查询:

WITH 'f' AS ch
SELECT 
  arraySplit((x, i) -> x = ch and sequence[i - 1] != ch or x != ch and sequence[i - 1] = ch, sequence, arrayEnumerate(sequence)) parts,
  arrayMap(part -> arrayMap((x, index) -> (x, x = ch ? index : 0), part, arrayEnumerate(part)), parts) parts_and_number,
  arrayFlatten(parts_and_number) result
FROM (
  SELECT arrayJoin([
    ['p','p','f','f','f','f','p','f', 'f', 'f'],
    ['p','w','f','f','f','f','p','f', 'f', 'f'],
    ['f','f','f','f','p','f', 'f', 'f'],
    ['p','w'],
    ['f', 'f'],  
    ['f']
  ]) as sequence)

/*
Row 1:
──────
parts:            [['p','p'],['f','f','f','f'],['p'],['f','f','f']]
parts_and_number: [[('p',0),('p',0)],[('f',1),('f',2),('f',3),('f',4)],[('p',0)],[('f',1),('f',2),('f',3)]]
result:           [('p',0),('p',0),('f',1),('f',2),('f',3),('f',4),('p',0),('f',1),('f',2),('f',3)]

Row 2:
──────
parts:            [['p','w'],['f','f','f','f'],['p'],['f','f','f']]
parts_and_number: [[('p',0),('w',0)],[('f',1),('f',2),('f',3),('f',4)],[('p',0)],[('f',1),('f',2),('f',3)]]
result:           [('p',0),('w',0),('f',1),('f',2),('f',3),('f',4),('p',0),('f',1),('f',2),('f',3)]

Row 3:
──────
parts:            [['f','f','f','f'],['p'],['f','f','f']]
parts_and_number: [[('f',1),('f',2),('f',3),('f',4)],[('p',0)],[('f',1),('f',2),('f',3)]]
result:           [('f',1),('f',2),('f',3),('f',4),('p',0),('f',1),('f',2),('f',3)]

Row 4:
──────
parts:            [['p','w']]
parts_and_number: [[('p',0),('w',0)]]
result:           [('p',0),('w',0)]

Row 5:
──────
parts:            [['f','f']]
parts_and_number: [[('f',1),('f',2)]]
result:           [('f',1),('f',2)]

Row 6:
──────
parts:            [['f']]
parts_and_number: [[('f',1)]]
result:           [('f',1)]
*/

【讨论】:

@redsquare np,很高兴为您提供帮助

以上是关于计算数组中前面重复项的数量的主要内容,如果未能解决你的问题,请参考以下文章

哪种解决方案性能最好,为啥要在复杂列表中查找重复项的数量?

计算对象数组中的重复项

如何去除List集合中的重复项ID的,并把重复项的数量相加

给定这个数组数组,我如何计算其中包含重复元素的数组的数量?

识别和删除数组中重复项的最有效方法是啥?

使用 Perl 检查数据数组中重复项的最有效方法是啥?