【比较难写的算法】最坏情况线性时间的选择

Posted 2023-04-08

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了【比较难写的算法】最坏情况线性时间的选择相关的知识，希望对你有一定的参考价值。

写一个从数组中选出第k小的元素的函数Select，要求最坏情况使用线性时间O(n)。

《算法导论》里的描述如下

1。将输入数组分为n/5个组，每组5个元素，至多1组剩下n mod 5 个元素。
2。寻找n/5个组中每一个组的中位数，首先对每个组的元素进行插入排序，然后从排序过的序列中选出中位数。
3。对第二步中得到的n/5个中位数，递归调用Select以找出其中位数x
4。利用修改过的partition过程，按中位数x对输入数组进行划分。让k比划分低区的元素数目多1，所以x是第k小的元素，并且有n-k个元素划分在高区。
5。如果i=k，则返回x。如果i<k，则在低区递归调用Select以找出第i小的元素。如果i>k，则在高区找第（i-k）个最小元素。

我目前的代码如下，又臃肿有有错误。求一份正确代码

int hoarePartition(int *A, int p, int r, int x)

int i = p - 1;
int j = r + 1;

for(;;)

while( A[--j] > x)
;
while( A[++i] < x)
;
if(i<j)

swap(A[i], A[j]);

else

return j;

// return the ith smallest element of A[p..r]
int select(int *A,int p, int r, int i)

if(p == r)

return A[p];

// #1. groupNum & rest
int groupNum = (r - p + 1) / 5;
int rest = (r - p + 1) % 5;
if( rest != 0 )

groupNum += 1;

// #2. sort the groups
if( rest == 0)
for(int t=0; t<groupNum; t+=1)
insertionSort(A + p + t*5, 5);
else

for(int t=0; t<groupNum - 1; t+=1)
insertionSort(A + p + t*5, 5);
insertionSort(A + p + (groupNum - 1) * 5, rest);

if(groupNum == 1)

return A[p + (rest+1)/2 - 1];

// #3. get the mid value x
int *mids = new int[groupNum];

if(rest == 0)
for(int t=0; t<groupNum; t+=1)
mids[t] = A[p + t*5 + 2 ];
else

for(int t=0; t<groupNum-1; t+=1)
mids[t] =A[p + t*5 + 2 ];
mids[groupNum-1] = A[p + (groupNum - 1) * 5 + (rest+1) / 2 - 1];

//std::cout << "\nEnter Mids [" << 0 << "," << groupNum-1 << "] ";
int x = select(mids,0, groupNum-1, (groupNum+1) / 2);
//std::cout << " x=" << x << " ";
//int xindex = binarySearch( A+p, r - p + 1, x) + p;
delete []mids;

// #4. partition with x
int k = hoarePartition(A, p, r, x) - p + 1;

// #5
if(i < k)

//std::cout << "\nEnter partition [" << p << "," << p+k-2 << "] ";
return select(A,p, p+k-2, i);

else if(i > k)

//std::cout << "\nEnter partition [" << p+k << "," << r << "] ";
return select(A,p + k, r, i-k);

else
return x;

实际上比平均情况下线性时间的选择要复杂很多（算法导论上伪代码都没有）
问题是快速排序要求枢纽元在最后一个，如果采用hoare的划分算法，就没有这个要求。而给出的是枢纽元的值，然后要找到位置（搜索一遍），再交换。
如果采用hoare划分法，不用搜索，不过算法和书上描述的就稍有不同了。

另外，因为代码复杂，所以对于随机输入，此算法较慢
下面是hoare划分的选择代码

# include <ctime>
# include <cstdlib>
# include <iostream>

inline void swap(int &x, int&y)

int temp = x;
x = y;
y = temp;

// A[p..r]
int hoarePartitionX(int *A, int p, int r, int x)

int i = p - 1;
int j = r + 1;

for(;;)

while( A[--j] > x)
;
while( A[++i] < x)
;
if(i<j)

swap(A[i], A[j]);

else

return j;

// A[0..size-1]
void insertionSort(int *A, int size)

int i;
int key;

for(int j=1; j<size; j+=1)

key = A[j];
i = j - 1;
while(i >= 0 && A[i] > key)

A[i+1] = A[i];
i -= 1;

A[i+1] = key;

// return the ith smallest element of A[p..r]
int select(int *A, int p, int r, int i)

if(p == r) // only one element, just return

return A[p];

// #1. groupNum & rest
int groupNum = (r - p + 1) / 5; // not counting the rest
int rest = (r - p + 1) % 5;

// #2. sort the groups

for(int t=0; t<groupNum; t+=1)

insertionSort(A + p + t*5, 5);

if(rest != 0)

insertionSort(A + p + groupNum * 5, rest);

// #3. get the mid value x
int *mids;
if(rest == 0)
mids = new int[groupNum];
else
mids = new int[groupNum+1];

for(int t=0; t<groupNum; t+=1)

mids[t] = A[ p + t*5 + 2 ];

if(rest != 0)

mids[groupNum] = A[ p + groupNum*5 + (rest-1)/2 ];

int x;
if( rest == 0 )

x = select(mids, 0, groupNum-1, (groupNum-1) / 2 + 1);

else

x = select(mids, 0, groupNum, groupNum / 2 + 1);

delete []mids;

// #4. partition with x
int k = hoarePartitionX(A, p, r, x) - p + 1; // so the value A[p+k-1] is the kth smallest

// #5.
if(i <= k)

return select(A, p, p+k-1, i);

else

return select(A, p+k, r, i-k);

int main()

int array[100];
for(int i=0; i<100; i+=1)
array[i] = i;

for(int i=0; i<100; i+=1)

int rnd = rand()%100;
swap(array[0], array[rnd]);

std::cout << select(array, 0, 99, 82);

std::cin.get();
return 0;
参考技术A 你就是把快速排法，改一下就可以了，很快的，

仍然不了解 Big-O 与最坏情况时间复杂度

【中文标题】仍然不了解 Big-O 与最坏情况时间复杂度【英文标题】：Still not understanding Big-O vs Worst Case Time Complexity 【发布时间】：2021-10-11 09:14:22 【问题描述】：

线性搜索花费时间的最坏情况是当项目位于列表/数组的末尾或不存在时。在这种情况下，算法将需要执行n 比较，以查看每个元素是否是所需的值，假设n 是数组/列表的长度。

根据我对大 O 表示法的理解，说这个算法的时间复杂度是 O(n) 是有道理的，因为它可能会发生最坏的情况，并且在以下情况下使用大 O我们想对“最坏情况”进行保守估计。

从 Stack Overflow 上的很多帖子和答案来看，这种想法似乎是有缺陷的，诸如 Big-O 表示法与最坏情况分析无关。

请帮助我理解这种区别，而不仅仅是增加我的困惑，就像这里的答案：Why big-Oh is not always a worst case analysis of an algorithm? 那样。

我没有看到 big-O 与最坏情况分析无关。在我目前的山顶上，看起来 big-O 表示最坏情况如何随着输入大小的增长而增长，这似乎与最坏情况分析非常“相关”。

诸如此类的声明，来自https://medium.com/omarelgabrys-blog/the-big-scary-o-notation-ce9352d827ce：

例如，最坏情况分析给出了假设输入处于最坏可能状态的最大操作数，而大 o 表示法表示在最坏情况下完成的最大操作数。

帮不上什么忙，因为我看不出所指的是什么区别。

非常感谢任何增加的清晰度。

【问题讨论】：

【参考方案1】：

大 O 表示法确实独立于最坏情况分析。它适用于您想要的任何功能。

在线性搜索的情况下，

最坏情况的复杂度是 O(n)（实际上甚至是 Θ(n)），

平均情况复杂度是 O(n)（实际上甚至是 Θ(n)），

最佳情况下的复杂度是 O(1)（实际上甚至是 Θ(1)）。

因此，大 O 和最坏情况是不同的概念，尽管算法运行时间的大 O 界限必须适用于最坏情况。

【讨论】：

【参考方案2】：

是这样的：

如果找到问题解决方案的算法在O(f(n)) 中，则意味着通过算法找到问题解决方案的最坏情况在O(f(n)) 中。换句话说，如果最坏的情况可以通过算法在g(n)步骤中找到，那么g(n)在O(f(n))中。

例如，对于搜索算法，正如您所提到的，我们知道最坏的情况可以在O(n) 中找到。现在，虽然算法在O(n) 中，但我们可以说算法也在O(n^2) 中。如您所见，这是 Big-Oh 复杂性和最坏情况之间的区别。

总之，算法的最坏情况复杂度是算法 Big-Oh 复杂度的子集。

【讨论】：

以上是关于【比较难写的算法】最坏情况线性时间的选择的主要内容，如果未能解决你的问题，请参考以下文章