缓存淘汰算法-LFU

Posted 2021-04-23 dejavuyj

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了缓存淘汰算法-LFU相关的知识，希望对你有一定的参考价值。

上一篇文章实现了LRU算法.

LRU

dejavuyj，公众号：dejavuyj

这篇文章接着讲解LFU算法.

LFU会根据访问频率来作为淘汰缓存的依据,如果访问频率相同,则会优先淘汰上一次访问时间距离当前时间最长的缓存.

leetcode460就是实现一个LFU缓存.

我们可以使用两个HashMap分别保存key->value和freq->keys来实现一个最简单的LFU算法.

key->value比较好理解.

freq->keys是指记录每一个访问频率对应的keys.

由于<访问频率相同,优先淘汰上一次访问时间距离当前时间最长的缓存>这个要求,一般考虑使用LinkedHashSet来保存某一频率对应的key的集合.这样可以很方便的按照放入LinkedHashSet的顺序来取出最久未被访问的key.

具体实现代码如下:

package leetcode.algorithm.cache;
import java.util.HashMap;
import java.util.LinkedHashSet;
import java.util.Map;
class LFUCache {

    private Integer capacity;
    private Map<Integer, Integer> keyToValue;
    private Map<Integer, Integer> keyToFreq;
    private Map<Integer, LinkedHashSet<Integer>> freqToKey;
    private Integer minFreq;
    public LFUCache(int capacity) {
        this.capacity = capacity;
        keyToValue = new HashMap<>();
        keyToFreq = new HashMap<>();
        freqToKey = new HashMap<>();
        minFreq = 0;
    }

    public int get(int key) {
        Integer value = keyToValue.get(key);
        if (value == null) {
            return -1;
        } else {
            increaseFreq(key);
            return value;
        }
    }

    public void put(int key, int value) {
        if (capacity <= 0) {
            return;
        }

        Integer existed = keyToValue.get(key);
        if (existed == null) {
            if (keyToValue.size() >= capacity) {
                removeMinFreqKey();
            }

            keyToValue.put(key, value);
            keyToFreq.put(key, 1);
            LinkedHashSet<Integer> set = freqToKey.get(1);
            if (set == null) {
                set = new LinkedHashSet<>();
                freqToKey.put(1, set);
            }
            set.add(key);
            minFreq = 1;
        } else {
            keyToValue.put(key, value);
            increaseFreq(key);
        }
    }

    private void increaseFreq(int key) {
        // 更新频率
        int oldFreq = keyToFreq.get(key);
        int newFreq = oldFreq + 1;
        keyToFreq.put(key, newFreq);
        freqToKey.get(oldFreq).remove(key);
        LinkedHashSet<Integer> newFreqSet = freqToKey.get(newFreq);
        if (newFreqSet == null) {
            newFreqSet = new LinkedHashSet<>();
            freqToKey.put(newFreq, newFreqSet);
        }
        newFreqSet.add(key);
        if (freqToKey.get(oldFreq).size() == 0) {
            if (oldFreq == minFreq) {
                minFreq++;
            }
        }
    }

    private void removeMinFreqKey() {
        LinkedHashSet<Integer> list = freqToKey.get(minFreq);
        Integer delKey = list.iterator().next();
        list.remove(delKey);
        keyToValue.remove(delKey);
        keyToFreq.remove(delKey);
    }
}

public class Code460_lfu_cache {

    public static void main(String[] args) {
        LFUCache c = new LFUCache(2);
        int v;
        c.put(1, 1);
        c.put(2, 2);
        v = c.get(1);
        System.out.println(v);
        c.put(3, 3);
        v = c.get(2);
        System.out.println(v);
        v = c.get(3);
        System.out.println(v);
        c.put(4, 4);
        v = c.get(1);
        System.out.println(v);
        v = c.get(3);
        System.out.println(v);
        v = c.get(4);
        System.out.println(v);
    }
}

如果看一下leetcode460的题解,可以发现keyToFreq这个HashMap并不是必要的,它可以被整合进内部类Node里.

另外可以使用自定义双向链表的方式来进一步优化,这样消耗的内存最少,并且算法运行速度也会是最快的.

leetcode里毕竟都是最简单的算法实现.那么在工程实践中,LFU算法还需要考虑哪些点呢?

首先,一段时间内频繁访问的key,在一定时间间隔后,未必还会被频繁访问,因此需要考虑引入随时间递减的机制.

redis里可以使用lfu-decay-time参数来控制递减速率.

另外为了提高运算效率,一般情况下,各种中间件都会使用一个int或者long来保存计数等状态.

redis使用24位来记录lru时钟

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
} robj;

使用lfu算法时,将这24位分为两部分

           16 bits      8 bits
      +----------------+--------+
      + Last decr time | LOG_C  |
      +----------------+--------+

高16 bits用来记录最近一次计数器降低的时间ldt，单位是分钟，低8 bits记录计数器数值counter。

这样的话计数器只能使用8位,最大值是255.

因此加入了lfu_log_factor影响因子来控制计数器的增长.

增长函数LFULogIncr如下：

/* Logarithmically increment a counter. The greater is the current counter value
 * the less likely is that it gets really implemented. Saturate it at 255. */
uint8_t LFULogIncr(uint8_t counter) {
    if (counter == 255) return 255;
    double r = (double)rand()/RAND_MAX;
    double baseval = counter - LFU_INIT_VAL;
    if (baseval < 0) baseval = 0;
    double p = 1.0/(baseval*server.lfu_log_factor+1);
    if (r < p) counter++;
    return counter;
}

r是0~1之间的一个随机数.

counter和lfu_log_factor越大,p越小,r<p的概率也越小，counter增长的概率也就越小。增长情况如下：

+--------+------------+------------+------------+------------+------------+
| factor | 100 hits   | 1000 hits  | 100K hits  | 1M hits    | 10M hits   |
+--------+------------+------------+------------+------------+------------+
| 0      | 104        | 255        | 255        | 255        | 255        |
+--------+------------+------------+------------+------------+------------+
| 1      | 18         | 49         | 255        | 255        | 255        |
+--------+------------+------------+------------+------------+------------+
| 10     | 10         | 18         | 142        | 255        | 255        |
+--------+------------+------------+------------+------------+------------+
| 100    | 8          | 11         | 49         | 143        | 255        |
+--------+------------+------------+------------+------------+------------+

特别地,当factor为0时,p始终为1,因此r一定小于p,因此每次调用都会LFULogIncr都会增加计数.

那为什么100hits的情况下,计数器是104呢?

当创建新对象的时候，对象的counter如果为0，很容易就会被淘汰掉，因此需要为新生key设置一个初始counter

redis引入了LFU_INIT_VAL来控制计数器的初始值.这个LFU_INIT_VAL的默认值就是5.

参考链接: