分而治之的策略来确定列表中是不是有超过 1/3 的相同元素

Posted

技术标签:

【中文标题】分而治之的策略来确定列表中是不是有超过 1/3 的相同元素【英文标题】:Divide and Conquer strategy to determine if more than 1/3 same element in list分而治之的策略来确定列表中是否有超过 1/3 的相同元素 【发布时间】:2020-01-07 06:22:18 【问题描述】:

我正在使用一种分而治之的算法来确定列表中是否有超过 1/3 的元素相同。 例如:[1,2,3,4] 不,所有元素都是唯一的。 [1,1,2,4,5] 是的,有两个是一样的。

如果没有排序,是否有分而治之的策略? 我被困在如何划分...

def is_valid(ids): 
    n = len(ids) 
    is_valid_recur(ids, n n-1)

def is_valid_recur(ids, l, r):
    m = (l + h) // 2
    return ....  is_valid_recur(ids, l, m) ...is_valid_recur(ids, m+1, r):

非常感谢!

【问题讨论】:

那么幼稚的方法是什么:计数直到任何计数超过len(ids)//2 这个算法怎么样:users.eecs.northwestern.edu/~dda902/336/hw4-sol.pdf 用于寻找多数元素? 这是作业吗? 如果两个项目各占列表的 1/3 以上怎么办:[1,1,1,1,1,2,2,2,2,3]? @user448810 只要一个元素超过1/3就应该返回true 【参考方案1】:

您可以使用二叉搜索树 (BST)。 1.创建BST维护每个节点的密钥计数 2. 使用分治法遍历树以找到最大键数 3. 测试最大计数是否 > n/3 使用 BST 中的数据,分而治之很简单,因为我们只是 必须确定左、当前或右分支是否具有最高的重复次数。

# A utility function to create a new BST node  
class newNode:  
    # Constructor to create a new node  
    def __init__(self, data):  
        self.key = data 
        self.count = 1
        self.left = None
        self.right = None

# A utility function to insert a new node  
# with given key in BST  
def insert(node, key): 
    # If the tree is empty, return a new node  
    if node == None: 
        k = newNode(key) 
        return k 

    # If key already exists in BST, increment 
    # count and return  
    if key == node.key: 
        (node.count) += 1
        return node 

    # Otherwise, recur down the tree  
    if key < node.key:  
        node.left = insert(node.left, key)  
    else: 
        node.right = insert(node.right, key) 

    # return the (unchanged) node pointer  
    return node 

# Finds the node with the highest count in a binary search tree
def MaxCount(node):
  if node == None:
    return 0, None
  else:
    left = MaxCount(node.left)
    right = MaxCount(node.right)
    current = node.count, node

    return max([left, right, current], key=lambda x: x[0])

def generateBST(a):
  root = None
  for x in a:
    root = insert(root, x)

  return root

# Driver Code 
if __name__ == '__main__': 
    a = [1, 2, 3, 1, 1]
    root = generateBST(a)
    cnt, node = MaxCount(root)
    if cnt >= (len(a) // 3):
      print(node.key)  # Prints 1
    else:
      print(None)

n/3 的非分而治之技术,从https://www.geeksforgeeks.org/n3-repeated-number-array-o1-space/ 有 O(n) 时间:

# Python 3 program to find if  
# any element appears more than 
# n/3. 
import sys 

def appearsNBy3(arr, n): 

    count1 = 0
    count2 = 0
    first = sys.maxsize 
    second = sys.maxsize 

    for i in range(0, n):  

        # if this element is 
        # previously seen,  
        # increment count1. 
        if (first == arr[i]): 
            count1 += 1

        # if this element is 
        # previously seen,  
        # increment count2. 
        elif (second == arr[i]): 
            count2 += 1

        elif (count1 == 0): 
            count1 += 1
            first = arr[i] 

        elif (count2 == 0): 
            count2 += 1
            second = arr[i] 


        # if current element is  
        # different from both 
        # the previously seen  
        # variables, decrement 
        # both the counts. 
        else: 
            count1 -= 1
            count2 -= 1



    count1 = 0
    count2 = 0

    # Again traverse the array 
    # and find the actual counts. 
    for i in range(0, n):  
        if (arr[i] == first): 
            count1 += 1

        elif (arr[i] == second): 
            count2 += 1


    if (count1 > n / 3): 
        return first 

    if (count2 > n / 3): 
        return second 

    return -1

# Driver code 
arr = [1, 2, 3, 1, 1 ] 
n = len(arr)  
print(appearsNBy3(arr, n)) 

【讨论】:

谢谢!我还希望算法确定列表是否有超过 1/3 的共同元素。我不确定如何修改算法.. @chen 也没有看到,所以我使用不同的算法修改了我的帖子。 谢谢.. 但是,这个算法似乎不是分而治之的。我正在尝试修改您以前的算法.. @chen 我对多数算法做了一个修改,为 n/3 而不是 n/2 生成一个。查看最近的帖子。 @DarrylG 试试 [2, 2, 1, 3,3,1,4,4,1]【参考方案2】:

这是我为了好玩而尝试的草稿。看起来分而治之可能会减少候选频率检查的数量,但我不确定(参见最后一个示例,其中仅针对完整列表检查 0)。

如果我们将列表分成三份,有效候选人可以拥有的最小频率是每个部分的 1/3。这缩小了我们在其他部分中搜索的候选列表。让f(A, l, r) 代表在其父组中频率可能为 1/3 或更高的候选人。那么:

from math import ceil

def f(A, l, r):
  length = r - l + 1

  if length <= 3:
    candidates = A[l:r+1]
    print "l, r, candidates: %s, %s, %s\n" % (l, r, candidates)
    return candidates

  i = 0
  j = 0
  third = length // 3
  lg_third = int(ceil(length / float(3)))
  sm_third = lg_third // 3

  if length % 3 == 1:
    i, j = l + third, l + 2 * third
  elif length % 3 == 2:
    i, j = l + third, l + 2 * third + 1
  else:
    i, j = l + third - 1, l + 2 * third - 1

  left_candidates = f(A, l, i)
  middle_candidates = f(A, i + 1, j)
  right_candidates = f(A, j + 1, r)
  print "length: %s, sm_third: %s, lg_third: %s" % (length, sm_third, lg_third)
  print "Candidate parts: %s, %s, %s" % (left_candidates, middle_candidates, right_candidates)
  left_part = A[l:i+1]
  middle_part = A[i+1:j+1]
  right_part = A[j+1:r+1]
  candidates = []
  seen = []

  for e in left_candidates:
    if e in seen or e in candidates:
      continue
    seen.append(e)
    count = left_part.count(e)
    if count >= lg_third:
      candidates.append(e)
    else:
      middle_part_count = middle_part.count(e)
      print "Left: counting %s in middle: %s" % (e, middle_part_count)
      if middle_part_count >= sm_third:
        count = count + middle_part_count
      right_part_count = right_part.count(e)
      print "Left: counting %s in right: %s" % (e, right_part_count)
      if right_part_count >= sm_third:
        count = count + right_part_count
      if count >= lg_third:
        candidates.append(e)

  seen = []
  for e in middle_candidates:
    if e in seen or e in candidates:
      continue
    seen.append(e)
    count = middle_part.count(e)
    if count >= lg_third:
      candidates.append(e)
    else:
      left_part_count = left_part.count(e)
      print "Middle: counting %s in left: %s" % (e, left_part_count)
      if left_part_count >= sm_third:
        count = count + left_part_count
      right_part_count = right_part.count(e)
      print "Middle: counting %s in right: %s" % (e, right_part_count)
      if right_part_count >= sm_third:
        count = count + right_part_count
      if count >= lg_third:
        candidates.append(e)

  seen = []
  for e in right_candidates:
    if e in seen or e in candidates:
      continue
    seen.append(e)
    count = right_part.count(e)
    if count >= lg_third:
      candidates.append(e)
    else:
      left_part_count = left_part.count(e)
      print "Right: counting %s in left: %s" % (e, left_part_count)
      if left_part_count >= sm_third:
        count = count + left_part_count
      middle_part_count = middle_part.count(e)
      print "Right: counting %s in middle: %s" % (e, middle_part_count)
      if middle_part_count >= sm_third:
        count = count + middle_part_count
      if count >= lg_third:
        candidates.append(e)
  print "l, r, candidates: %s, %s, %s\n" % (l, r, candidates)
  return candidates


#A = [1, 1, 2, 4, 5]
#A = [1, 2, 3, 1, 2, 3, 1, 2, 3]
#A = [1, 1, 1, 1, 1, 2, 2, 2, 2, 3]
A = [2, 2, 1, 3, 3, 1, 4, 4, 1]
#A = [x for x in range(1, 13)] + [0] * 6
print f(A, 0, len(A) - 1)

【讨论】:

以上是关于分而治之的策略来确定列表中是不是有超过 1/3 的相同元素的主要内容,如果未能解决你的问题,请参考以下文章

快速排序算法

搜索与排序—— 归并排序与快速排序

分而治之 - 比较所有可能的组合

使用 JProfiler,是不是有推荐的策略来确定应用程序是 CPU 还是网络绑定?

L2-025 分而治之(并查集)

递归分治最长运行