考虑所有连续子数组的数组每个元素的频率

Posted 2023-02-14

技术标签:

【中文标题】考虑所有连续子数组的数组每个元素的频率【英文标题】：Frequency of each element of an array considering all contiguos subarrays 【发布时间】：2015-11-02 23:42:30 【问题描述】：

考虑一个数组 A = [5,1,7,2,3]

所有连续子数组 = [5], [1], [7], [2], [3], [5,1], [1,7], [7,2], [2,3 ], [5,1,7], [1,7,2], [7,2,3], [5,1,7,2], [1,7,2,3], [5,1,7 ,2,3]

将上述集合中的所有数组替换为其中的最大元素：

set 看起来像这样： [5], [1], [7], [2], [3], [5], [7], [7], [3], [7], [7], [7], [7], [7], [7]

频率信息：[5] -> 2, [1] -> 1, [7] -> 9, [2] -> 1, [3] -> 2

我的目标是找到上述频率信息。

我的方法：

首先列出一对 (x,y)。 x 是 A 中的元素，它的索引是 y。

列表：[ (5,1), (1,2), (7,3), (2,4), (3,5) ]

按照第一个元素的降序对列表进行排序。现在，

列表：[ (7,3), (5,1), (3,5), (2,4), (1,2) ]

算法：

def f( array, first_index, last_index):
       ->select an element from LIST starting from left which
         is not marked as visited and (first_index <= element.second <=   
         last_index)
       ->calculate frequency info of element in tuple as  (element.secondvalue-first_index+1) + (element.secondvalue-first_index+1)*(last_index - element.second_value)
       ->Mark above element as visited
       ->Recursively solve f( array, first_index,element.secondvalue-1 ),f( array,element.secondvalue+1,last_index)

我们可以轻松设置合适的基本情况。

时间复杂度：O(n*n)

我已经尝试了很多上述算法，但无法提高时间复杂度。我该怎么做？任何提示，方法将不胜感激。

【问题讨论】：

好像有一个函数可以描述这种关系相关：this question。我在那里提供的答案将解决您在O(n) 中的问题，而不是O(n log n) 作为此问题的其他答案。如果您愿意，我可以在此处以新答案的形式重新发布您问题的相关部分。真的是 O(n) ！当然我想知道。请张贴它.. 【参考方案1】：

根据 Paul 的回答，我使用分段树的简化版本实现了 O(n log n) 版本。您会注意到这个答案与 Paul 的答案相同，但使用了 leftctsmaller 和 rightctsmaller 的优化版本。

实际上，它的作用是获取一个数组，比如说：

A = [5,1,7,2,3,7,3,1]

并构造一个数组支持的树，如下所示：

在树中，第一个数字是值，第二个是它出现在数组中的索引。每个节点是其两个子节点中的最大值。这棵树由一个数组（很像堆树）支持，其中索引i 的子节点位于索引i*2+1 和i*2+2 中。

然后，对于每个元素，找到最近的更大元素（之前和之后）变得容易。

例如，要找到离左边最近的较大元素，我们在树中向上搜索值较大且索引小于参数的第一个父元素。答案必须是这个父节点的子节点，然后我们在树中向下寻找满足相同条件的最右边的节点。

我已经在 Python 中实现了它（以及天真的版本，以检查答案），它似乎运行良好。

import sys, random
from collections import defaultdict
from math import log, ceil

def make_tree(A):
    n = 2**(int(ceil(log(len(A), 2))))
    T = [(None, None)]*(2*n-1)

    for i, x in enumerate(A):
        T[n-1+i] = (x, i)

    for i in reversed(xrange(n-1)):
        T[i] = max(T[i*2+1], T[i*2+2])

    return T

def print_tree(T):
    print 'digraph '
    for i, x in enumerate(T):
        print '    ' + str(i) + '[label="' + str(x) + '"]'
        if i*2+2 < len(T):
            print '    ' + str(i)+ '->'+ str(i*2+1)
            print '    ' + str(i)+ '->'+ str(i*2+2)

    print ''

def find_generic(T, i, fallback, check, first, second):
    j = len(T)/2+i
    original = T[j]
    j = (j-1)/2

    #go up in the tree searching for a value that satisfies check
    while j > 0 and not check(T[second(j)], original):
        j = (j-1)/2

    #go down in the tree searching for the left/rightmost node that satisfies check
    while j*2+1<len(T):
        if check(T[first(j)], original):
            j = first(j)
        elif check(T[second(j)], original):
            j = second(j)
        else:
            return fallback

    return j-len(T)/2


def find_left(T, i, fallback):
    return find_generic(T, i, fallback, 
        lambda a, b: a[0]>b[0] and a[1]<b[1],  #value greater, index before
        lambda j: j*2+2,                       #rightmost first
        lambda j: j*2+1                        #leftmost second
    ) 


def find_right(T, i, fallback):
    return find_generic(T, i, fallback,
        lambda a, b: a[0]>=b[0] and a[1]>b[1], #value greater or equal, index after
        lambda j: j*2+1,                       #leftmost first
        lambda j: j*2+2                        #rightmost second
    )       

def optimized_version(A):
    T = make_tree(A)

    answer = defaultdict(lambda: 0)
    for i, x in enumerate(A):
        left = find_left(T, i, -1)
        right = find_right(T, i, len(A))
        answer[x] += (i-left) * (right-i)

    return dict(answer)

def naive_version(A):
    answer = defaultdict(lambda: 0)
    for i, x in enumerate(A):
        left = next((j for j in range(i-1, -1, -1) if A[j]>A[i]), -1)
        right = next((j for j in range(i+1, len(A)) if A[j]>=A[i]), len(A))
        answer[x] += (i-left) * (right-i)
    return dict(answer)


A = [random.choice(xrange(32)) for i in xrange(8)]    
MA1 = naive_version(A)
MA2 = optimized_version(A)

sys.stderr.write('Array:     ' + str(A) + '\n')
sys.stderr.write('Naive:     ' + str(MA1) + '\n')
sys.stderr.write('Optimized: ' + str(MA2)  + '\n')
sys.stderr.write('OK:        ' + str(MA1==MA2)  + '\n')
#print_tree(make_tree(A))

【讨论】：

【参考方案2】：

我怀疑您的代码在 O(n^2) 中运行。无论如何，以更有效的方式解决此问题的一种方法是将每个数字映射到左侧/右侧小于给定项目的项目数。例如：

input = [2 , 3 , 1 , 5 , 4 , 8 , 0]
for number n = 5
leftctsmaller(n) = 3
rightctsmaller(n) = 1

此地图需要O(n^2) 才能生成。其余的很简单。给定左边和右边的空间，我们可以很容易地确定只包含比n 更小的数字的子数组的数量，除了n 本身。

【讨论】：

我们能否进一步提高它的时间复杂度 @DeepakYadav 应该是可能的。虽然我必须考虑一种优化搜索的方法。我明天更新我很确定您可以使用分段树来做到这一点。 @JuanLopes 谢谢。这是一个非常简单的解决方案，但我没有想到。 @Paul 实现了您使用分段树优化操作的想法【参考方案3】：

从下到上遍历您的值到索引图 - 维护一个增强的区间树。每次添加索引时，调整适当的区间，从相关段计算总和：

A = [5,1,7,2,3] => 1:1, 2:3, 3:4, 5:0, 7:2

indexes     interval     total sub-arrays with maximum exactly
1           (1,1)        1 =>           1
1,3         (3,3)        2 =>           1 
1,3,4       (3,4)        3 =>           2
1,3,4,0     (0,1)        5 =>           2
1,3,4,0,2   (0,4)        7 => 3 + 2*3 = 9

augmented trees 中的插入和删除具有O(log n) 的时间复杂度。最坏情况下的总时间复杂度为O(n log n)。

【讨论】：

【参考方案4】：

我很难用语言解释我的解决方案。我只会添加代码。它会自己解释：

#include <iostream>
#include <fstream>
using namespace std;

#define max 10000

int main(int argc, const char * argv[]) 

    ifstream input("/Users/appleuser/Documents/Developer/xcode projects/SubArrayCount/SubArrayCount/input.in");

    int n, arr[max], before[max]=0, after[max]=0, result[max];
    input >> n;
    for (int i=0; i<n; i++)
        input >> arr[i];

    for (int i=0;i<n;i++)
        for (int j=i-1;j>=0&&arr[j]<arr[i];j-=before[j]+1)
            before[i]+=before[j]+1;

    for (int i=n-1;i>=0;i--)
        for (int j=i+1;j<n&&arr[j]<arr[i];j+=after[j]+1)
            after[i]+=after[j]+1;

    for (int i=0;i<n;i++)
        result[i]= (before[i]+1)*(after[i]+1);

    for (int i=0; i<n; i++)
        cout << result [i] << " ";
    cout << endl;

    return 0;

(before[i]+1)*(after[i]+1)的解释：

对于每个值，我们需要数字位于该值之前且小于该值，数字位于该值之后且小于该值。

  | 0  1  2  3  4  5 .... count of numbers less than the value and appears before.
---------------------
0 | 1  2  3  4  5  6
1 | 2  4  6  8  10 12
2 | 3  6  9  12 15 18
3 | 4  8  12 16 20 24
4 | 5  10 15 20 25 30
5 | 6  12 18 24 30 36
. | 
. |
. |
count of numbers less than the value and appears after.

示例：对于一个比它少 3 个值并出现在之前的数字，并且比它少 4 个值并出现在之后的数字。答案是 V(3,4) = 20 = (3+1) * (4+1)

请告诉我结果。

【讨论】：

以上是关于考虑所有连续子数组的数组每个元素的频率的主要内容，如果未能解决你的问题，请参考以下文章