使用在 32x32 字母图像上训练的模型预测整个文档的 ocr 文本

Posted

技术标签:

【中文标题】使用在 32x32 字母图像上训练的模型预测整个文档的 ocr 文本【英文标题】:Predict an entire document ocr text using a model trained on 32x32 alphabet images 【发布时间】:2021-08-14 09:23:15 【问题描述】:

所以我使用从here下载的字母数据集为 OCR 训练了一个张量流模型

创建 Xtrain、Xtest 和 Ytrain、Ytest:文件夹包含每个字母的文件夹,其中包含 15k 个 32x32 的六张图像。

import os
from PIL import Image
from numpy import asarray

folders = os.listdir(path)

train_max = 100
test_max = 10

Xtrain = []
Ytrain = []
Xtest = []
Ytest = []

for folder in folders:
    folder_opened = path + folder + '/'
    count = 0
    for chars in os.listdir(folder_opened):
        count += 1
        if count <= train_max:
            image = Image.open(folder_opened + chars)
            data = asarray(image)
            Xtrain.append(data)
            Ytrain.append(folder)
        elif count > train_max and count <= train_max + test_max:
            image = Image.open(folder_opened + chars)
            data = asarray(image)
            Xtest.append(data)
            Ytest.append(folder)
        else:
            break

我的训练代码:

import tensorflow as tf

Xtrain = tf.keras.utils.normalize(Xtrain, axis = 1)
Xtest = tf.keras.utils.normalize(Xtest, axis = 1)

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(30, activation=tf.nn.softmax))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'],
              )

model.fit(Xtrain, factorize(Ytrain)[0], epochs=40, validation_data = (Xtest, factorize(Ytest)[0]))

此模型在预测包含 32x32 大小的单个字母的图像方面非常有效。

但对于现实生活中的应用,我需要使用此模型从文档中提取整个文本(例如:PAN 卡、身份证、护照等)

我都试过了:

我尝试使用枕头读取图像并将其转换为 numpy 数组,然后对其使用 model.predict。

image_adhar = Image.open(path_2 + 'adhar1.jpeg')
image_adhar = asarray(image_adhar)
model.predict([image_adhar])

这样做时,我收到此错误

WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'tuple'> input: (<tf.Tensor 'IteratorGetNext:0' shape=(None, 500, 3) dtype=uint8>,)
Consider rewriting this model with the Functional API.
WARNING:tensorflow:Model was constructed with shape (None, 32, 32) for input KerasTensor(type_spec=TensorSpec(shape=(None, 32, 32), dtype=tf.float32, name='flatten_30_input'), name='flatten_30_input', description="created by layer 'flatten_30_input'"), but it was called on an input with incompatible shape (None, 500, 3).

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-165-bba8716b47d4> in <module>
----> 1 model.predict([image_adhar])

~\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py in predict(self, x, batch_size, verbose, steps, callbacks, max_queue_size, workers, use_multiprocessing)
   1725           for step in data_handler.steps():
   1726             callbacks.on_predict_batch_begin(step)
-> 1727             tmp_batch_outputs = self.predict_function(iterator)
   1728             if data_handler.should_sync:
   1729               context.async_wait()

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
    887 
    888       with OptionalXlaContext(self._jit_compile):
--> 889         result = self._call(*args, **kwds)
    890 
    891       new_tracing_count = self.experimental_get_tracing_count()

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
    931       # This is the first call of __call__, so we have to initialize.
    932       initializers = []
--> 933       self._initialize(args, kwds, add_initializers_to=initializers)
    934     finally:
    935       # At this point we know that the initialization is complete (or less

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in _initialize(self, args, kwds, add_initializers_to)
    761     self._graph_deleter = FunctionDeleter(self._lifted_initializer_graph)
    762     self._concrete_stateful_fn = (
--> 763         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
    764             *args, **kwds))
    765 

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   3048       args, kwargs = None, None
   3049     with self._lock:
-> 3050       graph_function, _ = self._maybe_define_function(args, kwargs)
   3051     return graph_function
   3052 

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _maybe_define_function(self, args, kwargs)
   3442 
   3443           self._function_cache.missed.add(call_context_key)
-> 3444           graph_function = self._create_graph_function(args, kwargs)
   3445           self._function_cache.primary[cache_key] = graph_function
   3446 

~\anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3277     arg_names = base_arg_names + missing_arg_names
   3278     graph_function = ConcreteFunction(
-> 3279         func_graph_module.func_graph_from_py_func(
   3280             self._name,
   3281             self._python_function,

~\anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    997         _, original_func = tf_decorator.unwrap(python_func)
    998 
--> 999       func_outputs = python_func(*func_args, **func_kwargs)
   1000 
   1001       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

~\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in wrapped_fn(*args, **kwds)
    670         # the function a weak reference to itself to avoid a reference cycle.
    671         with OptionalXlaContext(compile_with_xla):
--> 672           out = weak_wrapped_fn().__wrapped__(*args, **kwds)
    673         return out
    674 

~\anaconda3\lib\site-packages\tensorflow\python\framework\func_graph.py in wrapper(*args, **kwargs)
    984           except Exception as e:  # pylint:disable=broad-except
    985             if hasattr(e, "ag_error_metadata"):
--> 986               raise e.ag_error_metadata.to_exception(e)
    987             else:
    988               raise

ValueError: in user code:

    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:1569 predict_function  *
        return step_function(self, iterator)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:1559 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1285 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2833 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3608 _call_for_each_replica
        return fn(*args, **kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:1552 run_step  **
        outputs = model.predict_step(data)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py:1525 predict_step
        return self(x, training=False)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:1030 __call__
        outputs = call_fn(inputs, *args, **kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\sequential.py:380 call
        return super(Sequential, self).call(inputs, training=training, mask=mask)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\functional.py:420 call
        return self._run_internal_graph(
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\functional.py:556 _run_internal_graph
        outputs = node.layer(*args, **kwargs)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:1013 __call__
        input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
    C:\Users\faris\anaconda3\lib\site-packages\tensorflow\python\keras\engine\input_spec.py:251 assert_input_compatibility
        raise ValueError(

    ValueError: Input 0 of layer dense_94 is incompatible with the layer: expected axis -1 of input shape to have value 1024 but received input with shape (None, 1500)



原谅我,但我是 keras 和 tensorflow 的新手。

我知道这个错误与训练文件的形状和我传递的图像的形状有关(adhar1.jpeg)。它们的形状不一样。 (32x32 和 500x281)但我不知道如何修改以接受我的 adhar1.jpeg 图像

【问题讨论】:

您需要第二个模型来检测图像上的每个字符。一旦检测到,您可以提取第一个字符,将其调整为 32x32 大小,然后将其提供给您的识别模型。对第二个字符、第三个字符等做同样的事情……但是渴了,你需要一个字符检测模型。 我刚刚点击了您提供的链接...难怪验证码会出现那些烦人的小图片 【参考方案1】:

由于您使用 32x32 图像训练了模型,因此您需要为模型提供相同尺寸的输入图像。

步骤1:从磁盘加载输入图像,将其转换为灰度,并对其进行模糊处理以减少噪声

第二步:进行边缘检测,在边缘图中找到轮廓,将得到的轮廓从左到右排序

第 3 步:循环遍历轮廓,计算轮廓的边界框并过滤掉过小和过大的框。

第四步:提取字符并设置阈值,使字符在黑色背景上显示为白色(前景),然后获取阈值的宽度和高度图片

第 5 步:调整图像大小并在需要时应用填充

第 6 步:为找到的所有字符运行模型

更多参考,可以查看:https://www.pyimagesearch.com/2020/08/24/ocr-handwriting-recognition-with-opencv-keras-and-tensorflow/

【讨论】:

这并不理想,因为检测到的字符可能不按顺序排列。因此,OCR 不会以正确的方式排列。此外,在某些情况下,尤其是对于低质量的图像,字符一旦进行了边缘检测和轮廓,图像的清晰度就会下降。这不是我的 OCR 模型的理想输入。另外,在发布此问题之前,我已经参考了您提供的参考链接。 确实,字符可能不按顺序排列,但由于每个字符都有 x,y,w,h,因此您可以轻松排列它。对于预测部分,您可以使用相同的 x,y,w,h 值从原始图像中获取输入,而不是从轮廓图像中提供输入。我认为这会奏效。

以上是关于使用在 32x32 字母图像上训练的模型预测整个文档的 ocr 文本的主要内容,如果未能解决你的问题,请参考以下文章

二预训练模型预测(Datawhale组队学习)

使用使用时尚 mnist 数据集训练的模型从谷歌图像(包)中预测图像的类别

错误 "IndexError: 如何在Keras中使用训练好的模型预测输入图像?

深度学习和目标检测系列教程 16-300:通过全球小麦数据集训练第一个yolov5模型

Python训练Kmeans算法预测图像的主色

错误! coreML 模型对图像的预测是错误的,对视频是正确的