如何在张量流中创建集成？

Posted 2023-03-12

技术标签:

【中文标题】如何在张量流中创建集成？【英文标题】：How to create ensemble in tensorflow? 【发布时间】：2016-05-29 15:02:13 【问题描述】：

我正在尝试创建一个包含许多经过训练的模型的集合。所有模型都有相同的图表，只是权重不同。我正在使用tf.get_variable 创建模型图。对于同一个图架构，我有几个不同的检查点（具有不同的权重），我想为每个检查点创建一个实例模型。

如何在不覆盖之前加载的权重的情况下加载多个检查点？

当我使用tf.get_variable 创建图表时，创建多个图表的唯一方法是传递参数reuse = True。现在，如果我尝试在加载之前更改将构建方法封闭在新范围内的图形变量的名称（因此它们变得不可与其他创建的图形共享），那么这将不起作用，因为新名称将与保存的名称不同重量，我将无法加载它。

【问题讨论】：

我还没试过，但是这里有一些参考代码：github.com/eske/seq2seq/blob/master/translate/__main__.py#L190 总之，作者创建尽可能多的检查点，并在每个会话中恢复相应的检查点。一些更相关的参考代码：github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/… @cesarsalgado：我也有同样的问题。我在 tf-slim 中使用 inception-v4。你是怎么解决的？ 【参考方案1】：

这需要一些技巧。让我们保存几个简单的模型

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import argparse
import tensorflow as tf


def build_graph(init_val=0.0):
    x = tf.placeholder(tf.float32)
    w = tf.get_variable('w', initializer=init_val)
    y = x + w
    return x, y


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--init', help='dummy string', type=float)
    parser.add_argument('--path', help='dummy string', type=str)
    args = parser.parse_args()

    x1, y1 = build_graph(args.init)

    saver = tf.train.Saver()
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        print(sess.run(y1, x1: 10))  # outputs: 10 + i

        save_path = saver.save(sess, args.path)
        print("Model saved in path: %s" % save_path)

# python ensemble.py --init 1 --path ./models/model1.chpt
# python ensemble.py --init 2 --path ./models/model2.chpt
# python ensemble.py --init 3 --path ./models/model3.chpt

这些模型产生“10 + i”的输出，其中 i=1、2、3。请注意，此脚本会多次创建、运行和保存 same 图形结构。加载这些值并单独恢复每个图形是民间传说，可以通过

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import argparse
import tensorflow as tf


def build_graph(init_val=0.0):
    x = tf.placeholder(tf.float32)
    w = tf.get_variable('w', initializer=init_val)
    y = x + w
    return x, y


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--path', help='dummy string', type=str)
    args = parser.parse_args()

    x1, y1 = build_graph(-5.)

    saver = tf.train.Saver()
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        saver.restore(sess, args.path)
        print("Model loaded from path: %s" % args.path)

        print(sess.run(y1, x1: 10))

# python ensemble_load.py --path ./models/model1.chpt  # gives 11
# python ensemble_load.py --path ./models/model2.chpt  # gives 12
# python ensemble_load.py --path ./models/model3.chpt  # gives 13

这些再次产生预期的输出 11,12,13。现在的诀窍是从整体中为每个模型创建自己的范围，例如

def build_graph(x, init_val=0.0):
    w = tf.get_variable('w', initializer=init_val)
    y = x + w
    return x, y


if __name__ == '__main__':
    models = ['./models/model1.chpt', './models/model2.chpt', './models/model3.chpt']
    x = tf.placeholder(tf.float32)
    outputs = []
    for k, path in enumerate(models):
        # THE VARIABLE SCOPE IS IMPORTANT
        with tf.variable_scope('model_%03i' % (k + 1)):
            outputs.append(build_graph(x, -100 * np.random.rand())[1])

因此，每个模型都存在于不同的变量范围内，即。我们有变量 'model_001/w:0, model_002/w:0, model_003/w:0' 尽管它们有一个相似（不相同）的子图，但这些变量确实是不同的对象。现在，诀窍是管理两组变量（当前范围内的图表和检查点中的变量）：

def restore_collection(path, scopename, sess):
    # retrieve all variables under scope
    variables = v.name: v for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scopename)
    # retrieves all variables in checkpoint
    for var_name, _ in tf.contrib.framework.list_variables(path):
        # get the value of the variable
        var_value = tf.contrib.framework.load_variable(path, var_name)
        # construct expected variablename under new scope
        target_var_name = '%s/%s:0' % (scopename, var_name)
        # reference to variable-tensor
        target_variable = variables[target_var_name]
        # assign old value from checkpoint to new variable
        sess.run(target_variable.assign(var_value))

完整的解决方案是

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import numpy as np
import tensorflow as tf


def restore_collection(path, scopename, sess):
    # retrieve all variables under scope
    variables = v.name: v for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scopename)
    # retrieves all variables in checkpoint
    for var_name, _ in tf.contrib.framework.list_variables(path):
        # get the value of the variable
        var_value = tf.contrib.framework.load_variable(path, var_name)
        # construct expected variablename under new scope
        target_var_name = '%s/%s:0' % (scopename, var_name)
        # reference to variable-tensor
        target_variable = variables[target_var_name]
        # assign old value from checkpoint to new variable
        sess.run(target_variable.assign(var_value))


def build_graph(x, init_val=0.0):
    w = tf.get_variable('w', initializer=init_val)
    y = x + w
    return x, y


if __name__ == '__main__':
    models = ['./models/model1.chpt', './models/model2.chpt', './models/model3.chpt']
    x = tf.placeholder(tf.float32)
    outputs = []
    for k, path in enumerate(models):
        with tf.variable_scope('model_%03i' % (k + 1)):
            outputs.append(build_graph(x, -100 * np.random.rand())[1])

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        print(sess.run(outputs[0], x: 10))  # random output -82.4929
        print(sess.run(outputs[1], x: 10))  # random output -63.65792
        print(sess.run(outputs[2], x: 10))  # random output -19.888203

        print(sess.run(W[0]))  # randomly initialize value -92.4929
        print(sess.run(W[1]))  # randomly initialize value -73.65792
        print(sess.run(W[2]))  # randomly initialize value -29.888203

        restore_collection(models[0], 'model_001', sess)  # restore all variables from different checkpoints
        restore_collection(models[1], 'model_002', sess)  # restore all variables from different checkpoints
        restore_collection(models[2], 'model_003', sess)  # restore all variables from different checkpoints

        print(sess.run(W[0]))  # old values from different checkpoints: 1.0
        print(sess.run(W[1]))  # old values from different checkpoints: 2.0
        print(sess.run(W[2]))  # old values from different checkpoints: 3.0

        print(sess.run(outputs[0], x: 10))  # what we expect: 11.0
        print(sess.run(outputs[1], x: 10))  # what we expect: 12.0
        print(sess.run(outputs[2], x: 10))  # what we expect: 13.0

# python ensemble_load_all.py

现在有了一个输出列表，您可以在 TensorFlow 内平均这些值或进行一些其他集成预测。

编辑：

使用 NumPy (npz) 将模型存储为 numpy 字典并加载这些值会更容易，就像我在此处的回答一样： https://***.com/a/50181741/7443104 上面的代码只是说明了一个解决方案。它没有健全性检查（就像变量真的存在一样）。 try-catch 可能会有所帮助。

【讨论】：

答案不包括已经训练模型的情况，并且想要创建一个新模型，它是训练模型的集合。如何合并不同模型的输入占位符？在您的回答中，您已经初始化了一个占位符，因为您正在那里创建图表，并且您可以将单个占位符设置为输入。但是如果有多个图表，每个图表都有自己的输入占位符呢？那么一模一样。您可以在build_graph 内构建图形。只需使用单个占位符或不同的占位符作为输入。加载权重不会更改图表，并且可以在加载权重之前修改图表。然后只需做tf.add_n(outputs) 或任何你想在你的合奏中做的事情。诀窍是将每个模型/以前的图表放入其自己的 variable_scope 并从那里开始。【参考方案2】：

关于这个主题有几个问题和很多可能的答案/方法。在这里，我想展示我是如何想出最优雅和最简洁的方法来制作N 模型的集合，其中N 是任意的。此解决方案已使用 tf 1.12.0、python 2.7 进行了测试

下面的代码sn-p就是你要找的（下面是cmets）：

import tensorflow as tf
import numpy as np

num_of_ensembles = N
savers = list()
palceholders = list()
inference_ops = list()

for i in xrange(num_of_ensembles):
    with tf.name_scope('model_'.format(i)):
        savers.append(tf.train.import_meta_graph('saved_model.ckpt.meta'))

graph = tf.get_default_graph()

for i in xrange(num_of_ensembles):
    placeholders.append(graph.get_operation_by_name('model_/input_ph'.format(i)).outputs[0])
    inference_ops.append(graph.get_operation_by_name('model_/last_operation_in_the_network'.format(i)).outputs[0])


with tf.Session() as sess:
    for i in xrange(num_of_ensembles):
        savers[i].restore(sess, 'saved_model.ckpt')
        prediction = sess.run(inference_ops[i], feed_dict=placeholders[i]: np.random.rand(your_input.shape))

所以，首先要做的是导入每个模型的元图。正如上面的 cmets 所建议的，关键是从集成为每个模型创建自己的范围，以便为每个变量范围添加像 model_001/, model_002/ ... 这样的前缀。这将允许您使用它们自己的独立变量恢复 N 不同的模型。

所有这些图表都将存在于当前的默认图表中。现在，当您加载模型时，您必须将您希望从图中使用的输入、输出和操作提取到新变量中。为此，您需要知道旧模型中这些张量的名称。您可以使用以下命令检查所有已保存的操作：ops = graph.get_operations()。在上面的示例中，第一个操作是占位符分配 /input_ph，而最后一个操作被命名为 /last_operation_in_the_network（通常，如果网络的作者没有指定该字段name 每一层，你会发现类似 /dense_3、/conv2d_1 等）。请注意，它必须是您模型的准确最终操作，而且您必须提供张量，即操作本身的值 .outputs[0]。

最后，您可以使用正确的推理操作和占位符运行会话，将预测作为 numpy 数组并做任何您想做的事情（平均、多数投票等）

您可能想要查看的有用链接：

reddit post A quick complete tutorial to save and restore Tensorflow models

【讨论】：

以上是关于如何在张量流中创建集成？的主要内容，如果未能解决你的问题，请参考以下文章