可变函数参数默认值的好用途？

Posted 2023-02-24

技术标签:

【中文标题】可变函数参数默认值的好用途？【英文标题】：Good uses for mutable function argument default values? 【发布时间】：2012-02-27 19:41:47 【问题描述】：

将可变对象设置为函数中参数的默认值是 Python 中的常见错误。这是取自this excellent write-up by David Goodger的示例：

>>> def bad_append(new_item, a_list=[]):
        a_list.append(new_item)
        return a_list
>>> print bad_append('one')
['one']
>>> print bad_append('two')
['one', 'two']

发生这种情况的解释是here。

现在我的问题是：这种语法有很好的用例吗？

我的意思是，如果遇到它的每个人都犯了同样的错误，调试它，理解问题并从那里尝试避免它，那么这种语法有什么用？

【问题讨论】：

我所知道的最好的解释是在链接的问题中：函数是一流的对象，就像类一样。类具有可变的属性数据；函数具有可变的默认值。这种行为不是“设计选择”——它是语言工作方式的结果——从简单的工作原理开始，尽可能少的例外。对我来说，当我开始“用 Python 思考”时，这种行为变得很自然——如果它没有发生，我会感到惊讶我也想过这个问题。这个例子遍布整个网络，但它没有任何意义——要么你想改变传递的列表并且有一个默认值没有意义，要么你想返回一个新的列表，你应该立即制作一个副本进入功能后。我无法想象两者都有用的情况。我刚刚遇到了一个更现实的例子，它没有我上面抱怨的问题。默认值是类的__init__ 函数的参数，它被设置为实例变量；这是一件非常有效的事情，而可变的默认值会出现可怕的错误。 ***.com/questions/43768055/… @MarkRansom：根据您的定义，（确定性）计算机上不会有任何错误。当您花足够的时间研究内部结构时，每个错误都是有意义的。老实说，这种行为是 Python 中为数不多的设计缺陷之一。 【参考方案1】：

您可以使用它在函数调用之间缓存值：

def get_from_cache(name, cache=):
    if name in cache: return cache[name]
    cache[name] = result = expensive_calculation()
    return result

但通常这类事情用类做得更好，因为你可以有额外的属性来清除缓存等。

【讨论】：

... 或记忆装饰器。 @functools.lru_cache(maxsize=None) 如果您有不可散列的值，lru_cache 不可用。 @Synedraacus：这个食谱也是。 @matrineau 不一定。如果您的一些参数是可哈希的，而其他参数不是，您可以使用此配方仅兑现可哈希的参数。 lru_cache 要求所有参数都是可散列的。【参考方案2】：

标准答案是这个页面：http://effbot.org/zone/default-values.htm

它还提到了可变默认参数的 3 个“好”用例：

将局部变量绑定到回调中外部变量的当前值 缓存/记忆全局名称的本地重新绑定（用于高度优化的代码）

【讨论】：

似乎“将局部变量绑定到回调中外部变量的当前值”只是 Python 中另一个设计缺陷的解决方法。【参考方案3】：

也许您不会改变可变参数，但确实期望可变参数：

def foo(x, y, config=):
    my_config = 'debug': True, 'verbose': False
    my_config.update(config)
    return bar(x, my_config) + baz(y, my_config)

（是的，我知道您可以在这种特殊情况下使用config=()，但我发现不太清楚，也不太通用。）

【讨论】：

还要确保你不变异并且不返回这个默认值直接从函数中，否则函数之外的一些代码可以变异它会影响所有的函数调用。【参考方案4】：

import random

def ten_random_numbers(rng=random):
    return [rng.random() for i in xrange(10)]

使用random 模块，实际上是一个可变单例，作为其默认随机数生成器。

【讨论】：

但这也不是一个非常重要的用例。我认为 Python 的“获取一次引用”和非 Python 的“每次函数调用查找 random 一次”之间的行为没有区别。两者最终都使用同一个对象。 random 不是可变的。 import random 然后print(hash(random))。模块、类（types，不是实例）和函数被认为是不可变的。这就是许多记忆和依赖注入机制的工作原理。注意：python 中的“可变”有一些非常具体的含义（从技术上讲，一切都是“可变的”，它是 python）。也许“冻结”是一个更好的术语。任何遵循Hashable (__hash__/__eq__) 接口的东西都可以被认为是冻结的。仅仅因为一个对象具有副作用并不使其可变：socket.socket 是具有副作用的一流可散列对象的另一个示例。【参考方案5】：

我知道这是一个旧的，但只是为了它，我想在这个线程中添加一个用例。我经常为 TensorFlow/Keras 编写自定义函数和层，将我的脚本上传到服务器，在那里训练模型（使用自定义对象），然后保存模型并下载它们。为了加载这些模型，我需要提供一个包含所有这些自定义对象的字典。

在像我这样的情况下，您可以在包含这些自定义对象的模块中添加一些代码：

custom_objects = 

def custom_object(obj, storage=custom_objects):
    storage[obj.__name__] = obj
    return obj

然后，我可以装饰任何需要在字典中的类/函数

@custom_object
def some_function(x):
    return 3*x*x + 2*x - 2

此外，假设我想将自定义损失函数存储在与自定义 Keras 层不同的字典中。使用 functools.partial 让我可以轻松访问新的装饰器

import functools
import tf

custom_losses = 
custom_loss = functools.partial(custom_object, storage=custom_losses)

@custom_loss
def my_loss(y, y_pred):
    return tf.reduce_mean(tf.square(y - y_pred))

【讨论】：

【参考方案6】：

编辑（澄清）：可变默认参数问题是更深层次设计选择的症状，即默认参数值存储为函数对象的属性。你可能会问为什么做出这个选择；与往常一样，此类问题很难正确回答。但它肯定有很好的用途：

优化性能：

def foo(sin=math.sin): ...

在闭包中获取对象值而不是变量。

callbacks = []
for i in range(10):
    def callback(i=i): ...
    callbacks.append(callback)

【讨论】：

整数和内置函数不可变！ @Jonathan：在剩下的例子中仍然没有可变的默认参数，还是我没有看到它？ @Jonathan：我的意思不是这些是可变的。这是 Python 用来存储默认参数的系统——在函数对象上，在编译时定义——可能很有用。这意味着可变的默认参数问题，因为在每个函数调用上重新评估参数将使这个技巧无用。 @katriealex：好的，但请在你的回答中这样说，你认为必须重新评估论点，并说明为什么这样做会很糟糕。 Nit-pick：默认参数值不在编译时存储，而是在函数定义语句执行时存储。 @WolframH：真的：P！虽然两者经常重合。【参考方案7】：

为了回答可变默认参数值的良好用途问题，我提供以下示例：

可变默认值可用于编写易于使用、可导入的您自己创建的命令。可变的默认方法相当于在函数中拥有私有的静态变量，您可以在第一次调用时初始化（非常像一个类），但不必求助于全局变量，不必使用包装器，也不必实例化导入的类对象。正如我希望你会同意的那样，它以自己的方式优雅。

考虑这两个例子：

def dittle(cache = []):

    from time import sleep # Not needed except as an example.

    # dittle's internal cache list has this format: cache[string, counter]
    # Any argument passed to dittle() that violates this format is invalid.
    # (The string is pure storage, but the counter is used by dittle.)

     # -- Error Trap --
    if type(cache) != list or cache !=[] and (len(cache) == 2 and type(cache[1]) != int):
        print(" User called dittle("+repr(cache)+").\n >> Warning: dittle() takes no arguments, so this call is ignored.\n")
        return

    # -- Initialize Function. (Executes on first call only.) --
    if not cache:
        print("\n cache =",cache)
        print(" Initializing private mutable static cache. Runs only on First Call!")
        cache.append("Hello World!")
        cache.append(0)
        print(" cache =",cache,end="\n\n")
    # -- Normal Operation --
    cache[1]+=1 # Static cycle count.
    outstr = " dittle() called "+str(cache[1])+" times."
    if cache[1] == 1:outstr=outstr.replace("s.",".")
    print(outstr)
    print(" Internal cache held string = '"+cache[0]+"'")
    print()
    if cache[1] == 3:
        print(" Let's rest for a moment.")
        sleep(2.0) # Since we imported it, we might as well use it.
        print(" Wheew! Ready to continue.\n")
        sleep(1.0)
    elif cache[1] == 4:
        cache[0] = "It's Good to be Alive!" # Let's change the private message.

# =================== MAIN ======================        
if __name__ == "__main__":

    for cnt in range(2):dittle() # Calls can be loop-driven, but they need not be.

    print(" Attempting to pass an list to dittle()")
    dittle([" BAD","Data"])
    
    print(" Attempting to pass a non-list to dittle()")
    dittle("hi")
    
    print(" Calling dittle() normally..")
    dittle()
    
    print(" Attempting to set the private mutable value from the outside.")
    # Even an insider's attempt to feed a valid format will be accepted
    # for the one call only, and is then is discarded when it goes out
    # of scope. It fails to interrupt normal operation.
    dittle([" I am a Grieffer!\n (Notice this change will not stick!)",-7]) 
    
    print(" Calling dittle() normally once again.")
    dittle()
    dittle()

如果您运行此代码，您将看到 dittle() 函数在第一次调用时内化，但在其他调用时不会内化，它使用私有静态缓存（可变默认值）在调用之间进行内部静态存储，拒绝试图劫持静态存储，对恶意输入具有弹性，并且可以根据动态条件（这里是函数被调用的次数）采取行动。

使用可变默认值的关键是不做任何会在内存中重新分配变量的操作，而是始终在原地更改变量。

要真正了解这种技术的潜在威力和实用性，请将第一个程序以“DITTLE.py”的名称保存到您的当前目录，然后运行下一个程序。它可以导入和使用我们的新 dittle() 命令，无需任何步骤来记住或编写程序来跳过。

这是我们的第二个例子。编译并作为新程序运行。

from DITTLE import dittle

print("\n We have emulated a new python command with 'dittle()'.\n")
# Nothing to declare, nothing to instantize, nothing to remember.

dittle()
dittle()
dittle()
dittle()
dittle()

这不是尽可能的光滑和干净吗？这些可变的默认值真的可以派上用场。

==========================

在思考了我的答案一段时间后，我不确定我在使用可变默认方法和常规方法之间做出了哪些区别完成同一件事的方法很清楚。

常规方法是使用包装类对象（并使用全局）的可导入函数。所以为了比较，这里有一个基于类的方法，它尝试做与可变默认方法相同的事情。

from time import sleep

class dittle_class():

    def __init__(self):
        
        self.b = 0
        self.a = " Hello World!"
        
        print("\n Initializing Class Object. Executes on First Call only.")
        print(" self.a = '"+str(self.a),"', self.b =",self.b,end="\n\n")
    
    def report(self):
        self.b  = self.b + 1
        
        if self.b == 1:
            print(" Dittle() called",self.b,"time.")
        else:
            print(" Dittle() called",self.b,"times.")
        
        if self.b == 5:
            self.a = " It's Great to be alive!"
        
        print(" Internal String =",self.a,end="\n\n")
            
        if self.b ==3:
            print(" Let's rest for a moment.")
            sleep(2.0) # Since we imported it, we might as well use it.
            print(" Wheew! Ready to continue.\n")
            sleep(1.0)

cl= dittle_class()

def dittle():
    global cl
    
    if type(cl.a) != str and type(cl.b) != int:
        print(" Class exists but does not have valid format.")
        
    cl.report()

# =================== MAIN ====================== 
if __name__ == "__main__":
    print(" We have emulated a python command with our own 'dittle()' command.\n")
    for cnt in range(2):dittle() # Call can be loop-driver, but they need not be.
    
    print(" Attempting to pass arguments to dittle()")
    try: # The user must catch the fatal error. The mutable default user did not. 
        dittle(["BAD","Data"])
    except:
        print(" This caused a fatal error that can't be caught in the function.\n")
    
    print(" Calling dittle() normally..")
    dittle()
    
    print(" Attempting to set the Class variable from the outside.")
    cl.a = " I'm a griefer. My damage sticks."
    cl.b = -7
    
    dittle()
    dittle()

将此基于类的程序保存在当前目录中为 DITTLE.py 然后运行下面的代码（和之前一样）

from DITTLE import dittle
# Nothing to declare, nothing to instantize, nothing to remember.

dittle()
dittle()
dittle()
dittle()
dittle()

通过比较这两种方法，在函数中使用可变默认值的优势应该更加明显。可变默认方法不需要全局变量，它的内部变量不能直接设置。虽然 mutable 方法在一个循环中接受了一个知识渊博的传递参数然后不理会它，但 Class 方法被永久更改了，因为它的内部变量直接暴露在外部。至于哪种方法更容易编程？我认为这取决于您对方法的熟悉程度以及您目标的复杂程度。

【讨论】：

我不知道为什么你在第二个例子中需要global。尽管如此，我认为第二个示例比第一个示例更具可读性。即使在功能上最终结果是相同的，使用class 向读者发出信号，“我有一些想要保持在一起的状态”。但是，你确实回答了这个问题，所以我给你道具。我实际上会说这是一个很好的反例，说明为什么实际使用可变参数几乎总是一个坏主意。

以上是关于可变函数参数默认值的好用途？的主要内容，如果未能解决你的问题，请参考以下文章

Python函数中的必选/默认/可变/关键字/命名参数

c语言可变参数是干啥的

python函数默认参数为可变对象的理解

Python 可变类型作为函数默认参数时的副作用

函数定义及参数

python函数中的位置参数默认参数关键字参数可变参数区别