Reservoir Sampling
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Reservoir Sampling相关的知识,希望对你有一定的参考价值。
Reservoir sampling is proposed to solve such set of problems: Randomly choose items from a stream of elements where could be very large or unknown in advance, i.e., all elements in the stream are equally likely to be selected with probability
The algorithm works as follows.
Let’s first take a look at a simple example with . When a new item comes, we either keep with probability or keep the old selected item with probability . We repeat this process till the end of the stream, i.e., all elements in have been visited. The probability that is chosen in the end is
Thus we prove the algorithm guarantees equal probability for all elements to be chosen. A Java implementation of this algorithm should look like this:
int random(int n) {
Random rnd = new Random();
int ret = 0;
for (int i = 1; i <= n; i++)
if (rnd.nextInt(i) == 0)
ret = i;
return ret;
}
is a little tricky. One straightforward way is to simply run the previous algorithm times. However, this does require multiple passes against the stream. Here we discuss another approach to get element randomly.
For item , there are two cases to handle:
- When , we just blindly keep
- When , we keep with probability
A simple implementation requires the memory space to store the selected elements, say . For every we first get a random number and keep when , i.e., . Otherwise is discarded. This guarantees the probability in the second scenario.
The proof is as previous. The probability of to be chosen is
is the probability that is replace by ad .
Below is a sample implementation in Java:
int[] random(int[] a, int k) {
int[] s = new int[k];
Random rnd = new Random();
for (int i = 0; i < k; i++)
s[i] = a[i];
for (int i = k + 1; i <= a.length; i++) {
int j = rnd.nextInt(i);
if (j < k) s[j] = a[i];
}
return s;
}
以上是关于Reservoir Sampling的主要内容,如果未能解决你的问题,请参考以下文章
Reservoir Sampling-382. Linked List Random Node