为多个分类编辑 tensorflow inceptionV3 retraining-example.py

Posted

技术标签:

【中文标题】为多个分类编辑 tensorflow inceptionV3 retraining-example.py【英文标题】:Edit tensorflow inceptionV3 retraining-example.py for multiple classificiations 【发布时间】:2016-10-31 23:45:22 【问题描述】:

TLDR:无法弄清楚如何使用重新训练的 inceptionV3 进行多个图像预测。

好心人 :) 我花了几天时间搜索许多 *** 帖子和文档,但我找不到这个问题的答案。非常感谢您对此的任何帮助!

我已经在新图片上重新训练了一个 tensorflow inceptionV3 模型,它能够按照https://www.tensorflow.org/versions/r0.9/how_tos/image_retraining/index.html 的说明并使用以下命令来处理新图片:

bazel build tensorflow/examples/label_image:label_image && \
bazel-bin/tensorflow/examples/label_image/label_image \
--graph=/tmp/output_graph.pb --labels=/tmp/output_labels.txt \
--output_layer=final_result \
--image= IMAGE_DIRECTORY_TO_CLASSIFY

但是,我需要对多个图像(如数据集)进行分类,并且非常纠结于如何做到这一点。我在

找到了以下示例

https://github.com/eldor4do/Tensorflow-Examples/blob/master/retraining-example.py

关于如何使用重新训练的模型,但同样,关于如何为多个分类修改它的细节非常稀疏。

根据我从 MNIST 教程中收集到的信息,我需要在 sess.run() 对象中输入 feed_dict,但由于我无法理解如何在这种情况下实现它而被困在那里。

我们将不胜感激任何帮助! :)

编辑:

运行 Styrke 的脚本并进行一些修改,我明白了

    waffle@waffleServer:~/git$ python tensorflowMassPred.py  I
       tensorflow/stream_executor/dso_loader.cc:108] successfully opened
       CUDA library libcublas.so locally I
       tensorflow/stream_executor/dso_loader.cc:108] successfully opened
       CUDA library libcudnn.so locally I
       tensorflow/stream_executor/dso_loader.cc:108] successfully opened
       CUDA library libcufft.so locally I
       tensorflow/stream_executor/dso_loader.cc:108] successfully opened
       CUDA library libcuda.so locally I
       tensorflow/stream_executor/dso_loader.cc:108] successfully opened
       CUDA library libcurand.so locally
       /home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py:1197:
       VisibleDeprecationWarning: converting an array with ndim > 0 to an
       index will result in an error in the future  
       result_shape.insert(dim, 1) I
       tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful
       NUMA node read from SysFS had negative value (-1), but there must be
       at least one NUMA node, so returning NUMA node zero I
       tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0
       with properties:  name: GeForce GTX 660 major: 3 minor: 0
       memoryClockRate (GHz) 1.0975 pciBusID 0000:01:00.0 Total memory:
       2.00GiB Free memory: 1.78GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0  I
       tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y  I
       tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating
       TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 660, pci
       bus id: 0000:01:00.0) W tensorflow/core/framework/op_def_util.cc:332]
       Op BatchNormWithGlobalNormalization is deprecated. It will cease to
       work in GraphDef version 9. Use tf.nn.batch_normalization(). E
       tensorflow/core/common_runtime/executor.cc:334] Executor failed to
       create kernel. Invalid argument: NodeDef mentions attr 'T' not in
       Op<name=MaxPool; signature=input:float -> output:float;
       attr=ksize:list(int),min=4; attr=strides:list(int),min=4;
       attr=padding:string,allowed=["SAME", "VALID"];
       attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]>;
       NodeDef: pool = MaxPool[T=DT_FLOAT, data_format="NHWC", ksize=[1, 3,
       3, 1], padding="VALID", strides=[1, 2, 2, 1],
       _device="/job:localhost/replica:0/task:0/gpu:0"](pool/control_dependency)
         [[Node: pool = MaxPool[T=DT_FLOAT, data_format="NHWC", ksize=[1, 3,
       3, 1], padding="VALID", strides=[1, 2, 2, 1],
       _device="/job:localhost/replica:0/task:0/gpu:0"](pool/control_dependency)]]
       Traceback (most recent call last):   File
       "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py",
       line 715, in _do_call
           return fn(*args)   File "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py",
       line 697, in _run_fn
           status, run_metadata)   File "/home/waffle/anaconda3/lib/python3.5/contextlib.py", line 66, in
       __exit__
           next(self.gen)   File "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors.py",
       line 450, in raise_exception_on_not_ok_status
           pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors.InvalidArgumentError: NodeDef
       mentions attr 'T' not in Op<name=MaxPool; signature=input:float ->
       output:float; attr=ksize:list(int),min=4;
       attr=strides:list(int),min=4; attr=padding:string,allowed=["SAME",
       "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC",
       "NCHW"]>; NodeDef: pool = MaxPool[T=DT_FLOAT, data_format="NHWC",
       ksize=[1, 3, 3, 1], padding="VALID", strides=[1, 2, 2, 1],
       _device="/job:localhost/replica:0/task:0/gpu:0"](pool/control_dependency)
         [[Node: pool = MaxPool[T=DT_FLOAT, data_format="NHWC", ksize=[1, 3,
       3, 1], padding="VALID", strides=[1, 2, 2, 1],
       _device="/job:localhost/replica:0/task:0/gpu:0"](pool/control_dependency)]]

       During handling of the above exception, another exception occurred:

       Traceback (most recent call last):   File "tensorflowMassPred.py",
       line 116, in <module>
           run_inference_on_image()   File "tensorflowMassPred.py", line 98, in run_inference_on_image
           'DecodeJpeg/contents:0': image_data)   File "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py",
       line 372, in run
           run_metadata_ptr)   File "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py",
       line 636, in _run
           feed_dict_string, options, run_metadata)   File "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py",
       line 708, in _do_run
           target_list, options, run_metadata)   File "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py",
       line 728, in _do_call
           raise type(e)(node_def, op, message) tensorflow.python.framework.errors.InvalidArgumentError: NodeDef
       mentions attr 'T' not in Op<name=MaxPool; signature=input:float ->
       output:float; attr=ksize:list(int),min=4;
       attr=strides:list(int),min=4; attr=padding:string,allowed=["SAME",
       "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC",
       "NCHW"]>; NodeDef: pool = MaxPool[T=DT_FLOAT, data_format="NHWC",
       ksize=[1, 3, 3, 1], padding="VALID", strides=[1, 2, 2, 1],
       _device="/job:localhost/replica:0/task:0/gpu:0"](pool/control_dependency)
         [[Node: pool = MaxPool[T=DT_FLOAT, data_format="NHWC", ksize=[1, 3,
       3, 1], padding="VALID", strides=[1, 2, 2, 1],
       _device="/job:localhost/replica:0/task:0/gpu:0"](pool/control_dependency)]]
       Caused by op 'pool', defined at:   File "tensorflowMassPred.py", line
       116, in <module>
           run_inference_on_image()   File "tensorflowMassPred.py", line 87, in run_inference_on_image
           create_graph()   File "tensorflowMassPred.py", line 68, in create_graph
           _ = tf.import_graph_def(graph_def, name='')   File "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/importer.py",
       line 274, in import_graph_def
           op_def=op_def)   File "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py",
       line 2260, in create_op
           original_op=self._default_original_op, op_def=op_def)   File "/home/waffle/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py",
       line 1230, in __init__
           self._traceback = _extract_stack()

这是脚本:删除了一些功能。

import os
import numpy as np
import tensorflow as tf
os.chdir('tensorflow/') #if need to run in the tensorflow directory
import csv,os
import pandas as pd
import glob

imagePath = '../_images_processed/test'
modelFullPath = '/tmp/output_graph.pb'
labelsFullPath = '/tmp/output_labels.txt'

# FILE NAME TO SAVE TO.
SAVE_TO_CSV = 'tensorflowPred.csv'


def makeCSV():
    global SAVE_TO_CSV
    with open(SAVE_TO_CSV,'w') as f:
        writer = csv.writer(f)
        writer.writerow(['id','label'])


def makeUniqueDic():
    global SAVE_TO_CSV
    df = pd.read_csv(SAVE_TO_CSV)
    doneID = df['id']
    unique = doneID.unique()
    uniqueDic = str(key):'' for key in unique #for faster lookup
    return uniqueDic


def create_graph():
    """Creates a graph from saved GraphDef file and returns a saver."""
    # Creates graph from saved graph_def.pb.
    with tf.gfile.FastGFile(modelFullPath, 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        _ = tf.import_graph_def(graph_def, name='')


def run_inference_on_image():
    answer = []
    global imagePath
    if not tf.gfile.IsDirectory(imagePath):
        tf.logging.fatal('imagePath directory does not exist %s', imagePath)
        return answer

    if not os.path.exists(SAVE_TO_CSV):
        makeCSV()

    files = glob.glob(imagePath+'/*.jpg')
    uniqueDic = makeUniqueDic()        
    # Get a list of all files in imagePath directory
    #image_list = tf.gfile.ListDirectory(imagePath)

    # Creates graph from saved GraphDef.
    create_graph()

    with tf.Session() as sess:

        softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')

        for pic in files:
            name = getNamePicture(pic)
            if name not in uniqueDic:
                image_data = tf.gfile.FastGFile(pic, 'rb').read()
                predictions = sess.run(softmax_tensor,
                                   'DecodeJpeg/contents:0': image_data)
                predictions = np.squeeze(predictions)

                top_k = predictions.argsort()[-5:][::-1]  # Getting top 5 predictions
                f = open(labelsFullPath, 'rb')
                lines = f.readlines()
                labels = [str(w).replace("\n", "") for w in lines]
#            for node_id in top_k:
#                human_string = labels[node_id]
#                score = predictions[node_id]
#                print('%s (score = %.5f)' % (human_string, score))
                pred = labels[top_k[0]]
                with open(SAVE_TO_CSV,'a') as f:
                    writer = csv.writer(f)
                    writer.writerow([name,pred])
    return answer

if __name__ == '__main__':
    run_inference_on_image()

【问题讨论】:

【参考方案1】:

所以看看你的链接脚本:

with tf.Session() as sess:

    softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
    predictions = sess.run(softmax_tensor,
                           'DecodeJpeg/contents:0': image_data)
    predictions = np.squeeze(predictions)

    top_k = predictions.argsort()[-5:][::-1] # Getting top 5 predictions

在这个 sn-p 中,image_data 是您要提供给模型的新图像,之前已加载了几行:

image_data = tf.gfile.FastGFile(imagePath, 'rb').read()

所以我的直觉是将run_inference_on_image 更改为接受imagePath 作为参数,并使用os.listdiros.path.join 对数据集中的每个图像执行此操作。

【讨论】:

您好,感谢您的回复!对不起,我不明白你的意思。你是说加载所有图片,然后循环 sess.run()? 不幸的是,这不是我想要的。它类似于在普通分类器(如 xgboost)中逐行拟合,并且速度非常慢(8k 图像需要 31 小时)。我正在寻找一种解决方案,将整个 X 图像输入 feed_dict 和分类器,并一次性输出所有图像的预测。 OK, reading the code which actually constructs the inference portion of the graph,你有一个张量_inputs,它的大小应该是[batch_size, height, width, channels]。接下来我会尝试将reading the JPEG 放入这样的4D 数组中,然后在sess.run 调用中将_inputs:0 替换为DecodeJpeg/contents:0 ResizeBilinear:0 也可能有用。【参考方案2】:

原始 jpeg 数据似乎被直接提供给 decode_jpeg 操作,该操作一次仅将单个图像作为输入。为了一次处理多个图像,您可能需要定义更多 decode_jpeg 操作。如果可以这样做,那么我目前不知道该怎么做。

下一个最好的方法很简单,可能是在 TensorFlow 会话中循环使用内部的所有图像一张一张地进行分类。这样,您至少可以避免重新加载图表并为您要分类的每张图像启动一个新的 TF 会话,如果您必须经常这样做,这两者都可能需要相当长的时间。

这里我更改了run_inference_on_image() 函数的定义,因此它应该对imagePath 变量指定的目录中的所有图像进行分类。这段代码我没有测试过,所以可能有一些小问题需要修复。

def run_inference_on_image():
    answer = []

    if not tf.gfile.IsDirectory(imagePath):
        tf.logging.fatal('imagePath directory does not exist %s', imagePath)
        return answer

    # Get a list of all files in imagePath directory
    image_list = tf.gfile.ListDirectory(imagePath)

    # Creates graph from saved GraphDef.
    create_graph()

    with tf.Session() as sess:

        softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')

        for i in image_list:
            image_data = tf.gfile.FastGFile(i, 'rb').read()
            predictions = sess.run(softmax_tensor,
                                   'DecodeJpeg/contents:0': image_data)
            predictions = np.squeeze(predictions)

            top_k = predictions.argsort()[-5:][::-1]  # Getting top 5 predictions
            f = open(labelsFullPath, 'rb')
            lines = f.readlines()
            labels = [str(w).replace("\n", "") for w in lines]
            for node_id in top_k:
                human_string = labels[node_id]
                score = predictions[node_id]
                print('%s (score = %.5f)' % (human_string, score))

            answer.append(labels[top_k[0]])
    return answer

【讨论】:

嗨,非常感谢您的回复 - 我在运行这个脚本时遇到了几个问题,我已经更新了我的答案。如果您能再次提供帮助,那就太好了! :) @Wboy 有一个TensorFlow issue 似乎与您的新问题有关。 还没有解决,但我会在它结束之前奖励你:)【参考方案3】:

我也有同样的问题。我遵循了所有可能的解决方案,最终找到了一个适合我的解决方案。当用于重新训练模型的 Tensorflow 版本与使用它的版本不同时,会发生此错误。

解决方案是将 Tensorflow 更新到最新版本。由于我使用 pip 安装 Tensorflow,我只需要运行以下命令:

sudo pip install tensorflow --upgrade 

而且效果很好。

【讨论】:

以上是关于为多个分类编辑 tensorflow inceptionV3 retraining-example.py的主要内容,如果未能解决你的问题,请参考以下文章

tensorflow 1.0 学习:用别人训练好的模型来进行图像分类

Tensorflow inception-V3 Re-Train 多层

我可以将 tensorflow inception pb 模型转换为 tflite 模型吗?

tensorflow系列:如何使用inception resnet v2网络

TensorFlow(十七):训练自己的图片分类模型

Tensorflow学习(练习)—下载骨骼图像识别网络inception数据集