Lru_cache（来自 functools）如何工作？

Posted 2023-02-23

技术标签:

【中文标题】Lru_cache（来自 functools）如何工作？【英文标题】：How does Lru_cache (from functools) Work? 【发布时间】：2018-09-27 17:16:51 【问题描述】：

特别是在使用递归代码时，lru_cache 有很大的改进。我确实理解缓存是存储数据的空间，这些数据必须快速提供服务，并且可以避免计算机重新计算。

functools 中的 Python lru_cache 如何在内部工作？

我正在寻找一个具体的答案，它是否像 Python 的其他部分一样使用字典？它只存储return 值吗？

我知道 Python 大量构建在字典之上，但是，我找不到这个问题的具体答案。希望有人可以为 *** 上的所有用户简化此答案。

【问题讨论】：

【参考方案1】：

functools 源代码可在此处获得：https://github.com/python/cpython/blob/master/Lib/functools.py

lru_cache 使用_lru_cache_wrapper 装饰器（带有参数模式的python 装饰器），它有一个cache 字典在上下文中，其中保存了调用函数的返回值（每个装饰函数将有自己的缓存字典）。字典键是使用参数中的_make_key 函数生成的。在下面添加了一些粗体 cmets：

# ACCORDING TO PASSED maxsize ARGUMENT _lru_cache_wrapper
# DEFINES AND RETURNS ONE OF wrapper DECORATORS

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # Constants shared by all lru cache instances:
    sentinel = object()      # unique object used to signal cache misses

    cache =                                 # RESULTS SAVES HERE
    cache_get = cache.get    # bound method to lookup a key or return None

    # ... maxsize is None:

    def wrapper(*args, **kwds):
        # Simple caching without ordering or size limit
        nonlocal hits, misses
        key = make_key(args, kwds, typed)     # BUILD A KEY FROM ARGUMENTS
        result = cache_get(key, sentinel)     # TRYING TO GET PREVIOUS CALLS RESULT
        if result is not sentinel:            # ALREADY CALLED WITH PASSED ARGS
            hits += 1
            return result                     # RETURN SAVED RESULT
                                              # WITHOUT ACTUALLY CALLING FUNCTION
        misses += 1
        result = user_function(*args, **kwds) # FUNCTION CALL - if cache[key] empty
        cache[key] = result                   # SAVE RESULT

        return result
    # ...

    return wrapper

【讨论】：

fib.cache_info() 打印如下内容： CacheInfo(hits=28, misses=16, maxsize=None, currsize=16) 这里的“miss”是什么意思？ @Ambareesh 在装饰器的开头这个计数器是在上下文中创建的并且都等于零：hits = misses = 0。每个修饰的函数调用都会增加其中之一 - hits += 1 如果缓存已经有数据，misses += 1 如果数据不在缓存中。检查来源。您可以使用此计数器来分析缓存的有用性所以如果第一次看到一个数字，未命中数会增加 1，对于相同数字的后续视图，命中数会增加？从我的测试代码来看，与使用我自己的 dict 缓存相比，@lru_cache 似乎更快，使用 cache.get(args, my_function(args) 并将 cache[args] = my_res 存储在结束。为什么呢？ @onepiece lru_cache() 是根据 _lru_cache_wrapper() 实现的，它有一个 C 实现，可能比任何 Python 实现都快。【参考方案2】：

LRU 缓存的 Python 3.9 源代码：https://github.com/python/cpython/blob/3.9/Lib/functools.py#L429

Fib 代码示例

@lru_cache(maxsize=2)
def fib(n):
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fib(n - 1) + fib(n - 2)

LRU 缓存装饰器检查一些基本情况，然后用包装器 _lru_cache_wrapper 包装用户函数。在包装器内部，发生了将项目添加到缓存的逻辑，LRU 逻辑，即向循环队列添加新项目，从循环队列中删除项目。

def lru_cache(maxsize=128, typed=False):
...
    if isinstance(maxsize, int):
        # Negative maxsize is treated as 0
        if maxsize < 0:
            maxsize = 0
    elif callable(maxsize) and isinstance(typed, bool):
        # The user_function was passed in directly via the maxsize argument
        user_function, maxsize = maxsize, 128
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        wrapper.cache_parameters = lambda : 'maxsize': maxsize, 'typed': typed
        return update_wrapper(wrapper, user_function)
    elif maxsize is not None:
        raise TypeError(
         'Expected first argument to be an integer, a callable, or None')

    def decorating_function(user_function):
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        wrapper.cache_parameters = lambda : 'maxsize': maxsize, 'typed': typed
        return update_wrapper(wrapper, user_function)

    return decorating_function

lru_cache 规范化maxsize(when negative)，添加CacheInfo 详细信息，最后添加包装器并更新装饰器文档和其他详细信息。

lru_cache_wrapper

Lru Cache 包装器的簿记变量很少。

 sentinel = object()          # unique object used to signal cache misses
 make_key = _make_key         # build a key from the function arguments
 PREV, NEXT, KEY, RESULT = 0, 1, 2, 3   # names for the link fields

 cache = 
 hits = misses = 0
 full = False
 cache_get = cache.get    # bound method to lookup a key or return None
 cache_len = cache.__len__  # get cache size without calling len()
 lock = RLock()           # because linkedlist updates aren't threadsafe
 root = []                # root of the circular doubly linked list
 root[:] = [root, root, None, None]     # initialize by pointing to self

包装器在执行任何操作之前获取锁。

一些重要的变量 - 根列表包含所有符合maxsize 值的项目。记住根的重要概念是在前一个 (0) 和下一个位置 (1) 中自引用 (root[:] = [root, root, None, None])

三个高级检查

第一种情况，当maxsize 为0 时，表示没有缓存功能，包装器包装了用户函数，没有任何缓存功能。包装器增加缓存未命中计数并返回结果。

 def wrapper(*args, **kwds):
     # No caching -- just a statistics update
     nonlocal misses
     misses += 1
     result = user_function(*args, **kwds)
     return result

第二种情况。当maxsize 为无时。在该部分中，缓存中存储的元素数量没有限制。所以包装器检查缓存（字典）中的键。当键存在时，包装器返回值并更新缓存命中信息。而当 key 丢失时，wrapper 使用用户传递的参数调用 user 函数，更新缓存，更新缓存未命中信息，并返回结果。

 def wrapper(*args, **kwds):
     # Simple caching without ordering or size limit
     nonlocal hits, misses
     key = make_key(args, kwds, typed)
     result = cache_get(key, sentinel)
     if result is not sentinel:
         hits += 1
         return result
     misses += 1
     result = user_function(*args, **kwds)
     cache[key] = result
     return result

第三种情况，maxsize 是默认值 (128) 或用户传递的整数值。这是实际的 LRU 缓存实现。包装器中的整个代码以线程安全的方式。在执行任何操作之前，从缓存中读取/写入/删除the wrapper obtains RLock。

LRU 缓存

缓存中的值存储为四个项目的列表（记住根）。第一项是对前一项的引用，第二项是对下一项的引用，第三项是特定函数调用的键，第四项是结果。这是斐波那契函数参数 1 [[[...], [...], 1, 1], [[...], [...], 1, 1], None, None] 的实际值。 [...] 表示对 self(list) 的引用。

第一个检查是缓存命中。如果是，则缓存中的值是四个值的列表。

 nonlocal root, hits, misses, full
 key = make_key(args, kwds, typed)
 with lock:
     link = cache_get(key)
      if link is not None:
          # Move the link to the front of the circular queue
          print(f'Cache hit for key, root')
          link_prev, link_next, _key, result = link
          link_prev[NEXT] = link_next
          link_next[PREV] = link_prev
          last = root[PREV]
          last[NEXT] = root[PREV] = link
          link[PREV] = last
          link[NEXT] = root
          hits += 1
          return result

当该项目已经在缓存中时，无需检查循环队列是否已满或从缓存中弹出该项目。而是更改循环队列中项目的位置。由于最近使用的项目总是在顶部，代码移动到最近的值到队列的顶部，前一个顶部的项目成为当前项目last[NEXT] = root[PREV] = link 和link[PREV] = last 和link[NEXT] = root 的下一个。 NEXT 和 PREV 在顶部初始化，指向列表PREV, NEXT, KEY, RESULT = 0, 1, 2, 3 # names for the link fields 中的适当位置。最后，增加缓存命中信息并返回结果。

当缓存未命中时，更新未命中信息，代码检查三种情况。这三个操作都是在获得 RLock 之后发生的。源码中的三种情况按如下顺序——获取锁键后在缓存中找到，缓存已满，缓存可以取新项。为了演示，我们按照顺序，当缓存未满时，缓存满，获取锁后，缓存中有key。

缓存未满时

    ...
    else:
        # Put result in a new link at the front of the queue.
        last = root[PREV]
        link = [last, root, key, result]
        last[NEXT] = root[PREV] = cache[key] = link
        # Use the cache_len bound method instead of the len() function
        # which could potentially be wrapped in an lru_cache itself.
        full = (cache_len() >= maxsize)

当缓存未满时，准备最近的result(link = [last, root, key, result])，包含根之前的引用、根、键和计算结果。

然后将最近的结果（链接）指向循环队列的顶部（root[PREV] = link），root的上一项指向最近的结果（last[NEXT]=link），并将最近的结果添加到缓存中（@987654345 @)。

最后，检查缓存是否已满（cache_len() >= maxsize and cache_len = cache.__len__ is declared in the top）并将状态设置为已满。

对于fib示例，当函数接收到第一个值1时，root为空，root值为[[...], [...], None, None]，将结果添加到循环队列后，root值为[[[...], [...], 1, 1], [[...], [...], 1, 1], None, None]。 previous 和 next 都指向键 1 的结果。而对于下一个值0，插入后的根值为

[[[[...], [...], 1, 1], [...], 0, 0], [[...], [[...], [...], 0, 0], 1, 1], None, None]。上一个是[[[[...], [...], None, None], [...], 1, 1], [[...], [[...], [...], 1, 1], None, None], 0, 0]，下一个是[[[[...], [...], 0, 0], [...], None, None], [[...], [[...], [...], None, None], 0, 0], 1, 1]

缓存满时

    ...
    elif full:
        # Use the old root to store the new key and result.
        oldroot = root
        oldroot[KEY] = key
        oldroot[RESULT] = result
        # Empty the oldest link and make it the new root.
        # Keep a reference to the old key and old result to
        # prevent their ref counts from going to zero during the
        # update. That will prevent potentially arbitrary object
        # clean-up code (i.e. __del__) from running while we're
        # still adjusting the links.
        root = oldroot[NEXT]
        oldkey = root[KEY]
        oldresult = root[RESULT]
        root[KEY] = root[RESULT] = None
        # Now update the cache dictionary.
        del cache[oldkey]
        # Save the potentially reentrant cache[key] assignment
        # for last, after the root and links have been put in
        # a consistent state.
        cache[key] = oldroot

当缓存已满时，使用 root 作为 oldroot(oldroot=root) 并更新键和结果。然后将 oldroot 下一项设为新根 (root=oldroot[NEXT])，复制新的根密钥和结果 (oldkey = root[KEY] and oldresult = root[RESULT])。将新的根键和结果设置为 None(root[KEY] = root[RESULT] = None)。从缓存中删除旧键的项（del cache[oldkey]）并将计算结果添加到缓存中（cache[key] = oldroot）。对于斐波那契示例，当缓存已满，key 为2，根值为[[[[...], [...], 1, 1], [...], 0, 0], [[...], [[...], [...], 0, 0], 1, 1], None, None]，块末尾的新根为[[[[...], [...], 0, 0], [...], 2, 1], [[...], [[...], [...], 2, 1], 0, 0], None, None]。如您所见，密钥 1 被删除并替换为密钥 2。

获取锁后key出现在缓存中时。

    if key in cache:
        # Getting here means that this same key was added to the
        # cache while the lock was released.  Since the link
        # update is already done, we need only return the
        # computed result and update the count of misses.
        pass

当key出现在缓存中时，获取锁后，另一个线程可能已经将该值入队。所以没什么可做的，包装器返回结果。

最后，代码返回结果。在执行缓存未命中部分之前，代码更新缓存未命中信息并调用make_key函数。

注意：我无法让嵌套列表缩进工作，因此答案在格式上可能看起来少一些。

【讨论】：

【参考方案3】：

您可以查看源代码here。

基本上它使用两个数据结构，一个字典将函数参数映射到其结果，以及一个链表来跟踪你的函数调用历史。

缓存基本上是使用以下实现的，这是不言自明的。

cache = 
cache_get = cache.get
....
make_key = _make_key         # build a key from the function arguments
key = make_key(args, kwds, typed)
result = cache_get(key, sentinel)

更新链表的要点是：

elif full:

    oldroot = root
    oldroot[KEY] = key
    oldroot[RESULT] = result

    # update the linked list to pop out the least recent function call information        
    root = oldroot[NEXT]
    oldkey = root[KEY]
    oldresult = root[RESULT]
    root[KEY] = root[RESULT] = None
    ......

【讨论】：

以上是关于Lru_cache（来自 functools）如何工作？的主要内容，如果未能解决你的问题，请参考以下文章

如何用 `functools.lru_cache` 正确装饰`classmethod`？

从内部函数禁用`functools.lru_cache`

将 functools.lru_cache 应用于 lambda

functools.lru_cache的实现

在 Python >= 3.2 中将缓存存储到文件 functools.lru_cache

functools模块（partial，lru_cache装饰器）