Greedy Algorithm and Huffman

Posted chunngai

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Greedy Algorithm and Huffman相关的知识,希望对你有一定的参考价值。

1. Question / 实践题目

技术图片

2. Analysis / 问题描述

We now have (k) sorted sequences, and we are to merge them into one sorted sequence with the merge algorithm. Our task is to find out the optimal merging order as well as the worst merging order so that the total comparison times for the merging algorithm is the least and the most.

3. Algorithm / 算法描述

3.1. Huffman Tree Algorithm

It‘s natural to use the Huffman Tree algorithm to solve the problem. In the task we get the lengths of the sequences, which is a lot like the probabilities of appearance for each character in the Huffman problem. And we merge two sequences each time, which is like picking two characters each time in the Huffman problem.

But since the Huffman algorithm is a greedy algorithm, which is able to find the local optimum of a task but cannot always find the global optimum, we should confirm that the Huffman algorithm can be applied here to find the global minimum and maximum. The global optimum of a task can be found out with the greedy algorithm if the task has the following two features:

  1. Greedy-Choice Property
  2. Optimal-Substructure

3.1.1. How to Measure?

It turns out that we can use the same way as the Huffman algorithm to measure if a solution is good. Here‘s the reason. Assume that the total times needed to merge all sequences is (B(T)), where (T) is the Huffman tree.
Let‘s go through the process of merging the sequences.

The original sequence list which contains sequences to be merged will be:


5 12 11 2

1. Pick the two smallest sequences and merge them together. According to the formula (m + n - 1), the merging time needed for the two sequences will be:
[2 + 5 - 1 = 6]

2.1. Add the merged sequence whose length is 7 to the sequence list. The sequence list now is:


7 12 11

2.2. Pick the two smallest sequences and merge them together. According to the formula (m + n - 1), the merging time needed for the two sequences will be:
[7 + 11 - 1 = 17]
And the total times will be:
[ egin{aligned} B(T) &= (2 + 5 - 1) + (7 + 11 - 1) &= (2 + 5 - 1) + ((2 + 5) + 11 - 1 ) end{aligned} ]

3.1. Add the merged sequence whose length is 7 to the sequence list. The sequence list now is:


12 18

3.2. Pick the two smallest sequences and merge them together. According to the formula (m + n - 1), the merging time needed for the two sequences will be:
[12 + 18 - 1 = 29]
And the total times will be:
[ egin{aligned} B(T) &= (2 + 5 - 1) + (7 + 11 - 1) + (12 + 18 - 1) &= (2 + 5 - 1) + (7 + 11 - 1 ) + (12 + (7 + 11) - 1) &= (2 + 5 - 1) + ((2 + 5) + 11 - 1) + (12 + ((2 + 5) + 11) - 1) end{aligned} ]

Thus
[ B(T) = 2 * 3 + 5 * 3 + 11 * 2 + 12 * 1 + (-1) * 3 = 52 ]
And it‘s the minimal solution, according to the question.

The Huffman tree is like:

技术图片

Obviously it‘s a lot like the way of the Huffman algorithm, except that we have a ((-1) * 3), where 3 is the depth of the tree. Therefore we can prove the two properties also in the same way as the Huffman algorithm.

3.2. Greedy-Choice Property

One way to prove the greedy-choice property is pruning. Here‘s what it does:

  1. Assume that we now have an optimal solution which is NOT calculated by the greedy choice.
  2. Replace the choice with the greedy choice.
  3. Prove that the replaced solution is better, or at least as good as the original solution.

Assume that we now have an optimal solution (T) for the task. In the tree the node (x) and (y), representing two sequences sorted, have the shortest lengths. According to the greedy algorithm and the proof of the Huffman algorithm they should be two children at the bottom of the tree (but they are not at the bottom in (T)).

技术图片

We now exchange the position of (x) and (b), so that the new tree is the situation calculated by the greedy choice. Let‘s call it (T^{'}).

技术图片

Now we are to prove that the new tree (T^{'}) is not worse than the original tree (T). In other words, we are to prove that
[ B(T^{'}) leq B(T) ]

that is,
[ B(T) - B(T^{'}) geq 0 ]

And since the only difference between the two trees are (b) and (x),
[ B(T) - B(T^{'}) = f(b)d_T(b) + f(x)d_T(x) - (f(x)d_{T^{'}}(x) + f(b)d_{T^{'}}(b)) ]

And because
[ d_{T^{'}}(x) = d_T(b) d_{T^{'}}(b) = d_T(x) \\]
for that their corresponding positions are the same, we get
[ egin{aligned} B(T) - B(T^{'}) &= f(b)d_T(b) + f(x)d_T(x) - (f(x)d_T(b) + f(b)d_T(x)) &= f(b)d_T(b) + f(x)d_T(x) - f(x)d_T(b) - f(b)d_T(x) &= (f(b) - f(x))(d_T(b) - d_T(x)) end{aligned} ]

Since (x) is the shortest,
[ f(b) - f(x) geq 0 ]

And according to tree (T)
[ d_T(b) - d_T(x) > 0 ]

we get
[ (f(b) - f(x))(d_T(b) - d_T(x)) geq 0 ]
thus
[ B(T) - B(T^{'}) geq 0 ]

技术图片

In the same way, when we exchange the position of (c) and (y) and get a new tree (T^{''}), we have
[ (f(c) - f(y))(d_{T^{'}}(c) - d_{T^{'}}(y)) geq 0 ]
which is equiavalent to
[ B(T^{'}) - B(T^{''}) geq 0 ]

Therefore, we have
[ B(T) geq B(T^{'}) geq B(T^{''}) ]

and since (T) is an optimal solution, we get
[ B(T) leq B(T^{''}) ]

thus
[ left . egin{aligned} B(T) geq B(T^{'}) geq B(T^{''}) B(T) leq B(T^{''}) end{aligned} ight } B(T^{''}) = B(T) ]

Therefore we‘ve proved that the tree (T^{''}) which is calculated with greedy choices is not worse than the original tree (T) which is not calculated with greedy choices.

3.3. Optimal Substructure

技术图片

Let‘s say the tree (T) here is an optimal solution, and its substructure (T^{'}) is the tree without (x) and (y).

[ egin{aligned} f(x)d(x) + f(y)d(y) &in B(T) f(x)(d1 + d2 + dx) + f(y)(d1 + d2 + dy) &in B(T) f(x)(d1 + d2) + f(x)dx + f(y)(d1 + d2) + f(y)dy &in B(T) (f(x) + f(y))(d1 + d2) + f(x)dx + f(y)dy &in B(T) f(z)(d1 + d2) &in B(T) - f(x)dx - f(y)dy f(z)(d1 + d2) &in B(T^{'}) end{aligned} ]

thus
[ B(T) = B(T^{'}) + f(x)dx + f(y)dy ]

If (T^{'}) is not the optimal substructure, let‘s say the optimal substructure is (T{''}). So
[ B(T^{''}) < B(T^{'}) ]
thus
[ B(T^{''}) + f(x)dx + f(y)dy < B(T^{'}) + f(x)dx + f(y)dy ]

and since
[ B(T) = B(T^{'}) + f(x)dx + f(y)dy ]

we have

[ B(T^{''}) + f(x)dx + f(y)dy < B(T) ]

But since (T) is the optimal solution, the equation above is wrong. Thus the optimal solution of the task depends on the optimal substructure of the subtask.

3.4. Mimic the Huffman Algorithm

The strategy is a lot like the Huffman tree algorithm, as introduced above. To find out the worst solution we should merely find two longest sequences each time instead of two shortest.

int minTotalTimes = 0;
    int maxTotalTimes = 0;
    for (int i = 1; i <= k - 1; i++) {
        // find two shortest sequences
        int minTimes1 = qIncreasing.top();
        qIncreasing.pop();

        int minTimes2 = qIncreasing.top();
        qIncreasing.pop();

        // add up the times needed
        minTotalTimes += minTimes1 + minTimes2 - 1;

        // push the merged sequence to the list
        qIncreasing.push(minTimes1 + minTimes2);

        // find two longest sequences
        int maxTimes1 = qDecreasing.top();
        qDecreasing.pop();

        int maxTimes2 = qDecreasing.top();
        qDecreasing.pop();

        // add up the times needed
        maxTotalTimes += maxTimes1 + maxTimes2 - 1;

        // push the merged sequence to the list
        qDecreasing.push(maxTimes1 + maxTimes2);
    }

4. T(n) and S(n) / 算法时间及空间复杂度分析(要有分析过程)

According to the Huffman algorithm, only a for-loop is needed. And the push() and pop() of priority_queue has a time complexity of (log k), where (k) is the number of sequences to be merged. Thus the time complexity is
[ T(k) = O(k log k) ]

And a queue, which is actually a heap, is needed to store the sequences. According to the Huffman algorithm the maximal length of the queue is (k). Thus the space complexity is
[ S(k) = O(k) ]

5. Experience / 心得体会(对本次实践收获及疑惑进行总结)

The greedy algorithm is a lot easier than DP. The key point is we should confirm that the global optima of the task can be found out using the greedy algorithm, with the two properties introduced above.

6. Show Me the Code / 完整代码

#include <iostream>
#include <queue>
using namespace std;

int main() {
    /* receive inputs */
    int k;  // number of sequences to be merged
    cin >> k;

    // store sequences in a decreasing order
    priority_queue<int> qDecreasing;
    // store sequences in an increasing order
    priority_queue<int, vector<int>, greater<> > qIncreasing;

    // receive lengths of sequences
    for (int i = 1; i <= k; i++) {
        int input;
        cin >> input;

        qDecreasing.push(input);
        qIncreasing.push(input);
    }

    /* greedy algorithm */
    int minTotalTimes = 0;
    int maxTotalTimes = 0;
    for (int i = 1; i <= k - 1; i++) {
        // find two shortest sequences
        int minTimes1 = qIncreasing.top();
        qIncreasing.pop();

        int minTimes2 = qIncreasing.top();
        qIncreasing.pop();

        // add up the times needed
        minTotalTimes += minTimes1 + minTimes2 - 1;

        // push the merged sequence to the list
        qIncreasing.push(minTimes1 + minTimes2);

        // find two longest sequences
        int maxTimes1 = qDecreasing.top();
        qDecreasing.pop();

        int maxTimes2 = qDecreasing.top();
        qDecreasing.pop();

        // add up the times needed
        maxTotalTimes += maxTimes1 + maxTimes2 - 1;

        // push the merged sequence to the list
        qDecreasing.push(maxTimes1 + maxTimes2);
    }

    /* display */
    cout << maxTotalTimes << " " << minTotalTimes;

    return 0;
}

references:
1. 优先队列priority_queue的比较函数
2. priority_queue用法详解

以上是关于Greedy Algorithm and Huffman的主要内容,如果未能解决你的问题,请参考以下文章

[Algorithm][Greedy]Dijsktra Algorithm

LintCode - Maximum Subarray - Greedy Algorithm

每周算法贪心算法(Greedy Algorithm)

「码趣分享」贪心算法Greedy Algorithm

[Algorithm][Greedy] Kruskal’s Minimum Spanning Tree Algorithm

2018.2.25-26 algo part3 greedy algorithm