用matlab实现图像识别的功能，提供一下思路

Posted 2023-05-18

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了用matlab实现图像识别的功能，提供一下思路相关的知识，希望对你有一定的参考价值。

我想用matlab实现图像的识别，是通过导入一张图片与已有的图片进行对比，识别出图片内容是什么。导入了图片之后应该做哪些处理呢？应该有边缘检测，灰度值等等吧，具体需要怎么编程序呢？希望提供一些思路

参考技术A 模式识别过程通常是这样的
1) 预处理,如图像滤波,灰度值处理,格式转换等
2)分割出要识别的图像部分(与无关部分分离),这里就会用到如边缘检测之类的技术
3)提取图像特征(待识别部分),并对这些特征进行数字化度量
4)根据特征进行分类识别,需要用到知识库(即已知图片的特征)
总而言之,图像识别是一个比较大的题目,要掌握的话还得有很多相关的知识基础(如数学/图像处理等)本回答被提问者采纳

用 OCR 识别的文本对图像进行去模糊处理

【中文标题】用 OCR 识别的文本对图像进行去模糊处理【英文标题】：Deblur image with text to be recognized by OCR 【发布时间】：2018-07-18 08:25:48 【问题描述】：

我有一张模糊的图像：这是名片的一部分，它是相机拍摄的帧之一，没有正确对焦。

清晰的图像看起来像这样我正在寻找一种可以为我提供质量更好的图像的方法，以便 OCR 可以识别该图像，但也应该很快。图像并没有模糊太多（我认为是这样），但不适合 OCR。我试过了：

不同种类的 HPF，拉普拉斯算子， Canny 检测器，形态学操作的组合（打开、关闭）。

我也试过了：

使用维纳滤波器去卷积，反卷积和 Lucy-Richardson 方法。

但要找到合适的 PSF（点扩散函数）并不容易。这些方法被认为是有效的，但速度不够快。我也尝试了 FFT，然后使用高斯掩码进行 IFFT，但结果并不令人满意。我正在寻找某种用文本去模糊图像的一般方法，而不仅仅是这张图片。有人可以帮我解决这个问题吗？我将不胜感激任何建议。我正在使用 OpenCV 3（C++，有时是 Python）。

【问题讨论】：

rroij.com/open-access/… 也是 Gull 和 Skilling 的开创性论文（MaxEnt 方法）我有一些想法，但还没有具体的/还没有还有citeseerx.ist.psu.edu/viewdoc/… dsp.stackexchange.com/questions/tagged/blind-deconvolution @LuisFelipe - 不。我们正在寻找相当快的东西，但上述方法中没有任何东西符合我们的期望。我们决定检查图像的清晰度。如果清晰度不够好，用户会被告知，他应该再拍一张。 【参考方案1】：

你知道Blind deconvolution吗？

盲反卷积是一种众所周知的恢复天文图像的技术。这对于很难找到 PSF 的应用程序特别有用。

Here 是该技术的一个 C++ 实现。这个paper 也和你要找的很相关。这是他们算法的示例输出：

【讨论】：

那是相当疯狂的重建。由 CNN 提供支持，对吧？这太酷了，但我想知道尽可能接近实时地执行此处理所需的资源（计算能力和复杂性）。 @eldesgraciado 是的，我同意重建。盲反卷积是一种昂贵的算法，我怀疑它是否可以实时完成。但是，可以通过在频域执行任务来最小化计算复杂度。谁有关于如何用 python 实现这个的教程？ @JimO。 - 你有没有得到答案？我也会对此感兴趣。 @Robert Oschler 我还在祈祷【参考方案2】：

我最近也遇到了这个问题，并提出了一个类似的question，其中包含更多细节和最近的方法。到目前为止，这似乎是一个未解决的问题。最近有一些研究工作试图通过深度学习来解决这些问题。不幸的是，没有一部作品达到我们的预期。不过，我会分享这些信息，以防它对任何人都有帮助。

1。野外场景文本图像超分辨率

就我们而言，这可能是我们最后的选择；相对来说，表现还算不错。这是最近的一项研究工作（TSRN），主要集中在此类案例上。它的主要直观性是引入超分辨率（SR）技术作为预处理。这个implementation 看起来是迄今为止最有前途的。这是他们的成就的说明，将模糊改进为清晰的图像。

2。神经增强

从他们的repo 演示来看，它似乎也可能具有改善模糊文本的潜力。但是，作者可能大约 4 年都没有维护这个 repo。

3。使用 GAN 进行盲运动去模糊

吸引人的部分是其中的Blind Motion Deblurring机制，命名为DeblurGAN。看起来很有希望。

4。通过核估计和噪声注入实现真实世界的超分辨率

关于their work 的一个有趣事实是，与其他文学作品不同，他们首先通过估计各种模糊核以及真实为现实世界的图像设计了一个新颖的退化框架 strong>噪声分布。基于此，他们获取 LR 图像，这些图像与真实世界的图像共享一个公共域。然后，他们提出了一个真实世界的超分辨率模型，旨在获得更好的感知。来自他们的文章：

但是，根据我的观察，我无法获得预期的结果。我提出了一个issue on github，直到现在没有得到任何回应。

用于直接文本去模糊的卷积神经网络

@Ali 分享的paper 看起来很有趣，效果也非常好。很高兴他们分享了他们训练模型的预训练权重，并分享了 python 脚本以便于使用。但是，他们已经尝试使用 Caffe 库。我宁愿转换成 PyTorch 以更好地控制。下面是提供的带有 Caffe 导入的 python 脚本。请注意，由于缺乏Caffe知识，直到现在我无法完全移植它，如果您知道，请纠正我。

from __future__ import print_function
import numpy as np
import os, sys, argparse, glob, time, cv2, Queue, caffe

# Some Helper Functins 
def getCutout(image, x1, y1, x2, y2, border):
    assert(x1 >= 0 and y1 >= 0)
    assert(x2 > x1 and y2 >y1)
    assert(border >= 0)
    return cv2.getRectSubPix(image, (y2-y1 + 2*border, x2-x1 + 2*border), (((y2-1)+y1) / 2.0, ((x2-1)+x1) / 2.0))

def fillRndData(data, net):
    inputLayer = 'data'
    randomChannels = net.blobs[inputLayer].data.shape[1]
    rndData = np.random.randn(data.shape[0], randomChannels, data.shape[2], data.shape[3]).astype(np.float32) * 0.2
    rndData[:,0:1,:,:] = data
    net.blobs[inputLayer].data[...] = rndData[:,0:1,:,:]

def mkdirp(directory):
    if not os.path.isdir(directory):
        os.makedirs(directory)

主要功能从这里开始

def main(argv):
    pycaffe_dir = os.path.dirname(__file__)

    parser = argparse.ArgumentParser()
    # Optional arguments.
    parser.add_argument(
        "--model_def",
        help="Model definition file.",
        required=True
    )
    parser.add_argument(
        "--pretrained_model",
        help="Trained model weights file.",
        required=True
    )
    parser.add_argument(
        "--out_scale",
        help="Scale of the output image.",
        default=1.0,
        type=float
    )
    parser.add_argument(
        "--output_path",
        help="Output path.",
        default=''
    )
    parser.add_argument(
        "--tile_resolution",
        help="Resolution of processing tile.",
        required=True,
        type=int
    )
    parser.add_argument(
        "--suffix",
        help="Suffix of the output file.",
        default="-deblur",
    )
    parser.add_argument(
        "--gpu",
        action='store_true',
        help="Switch for gpu computation."
    )
    parser.add_argument(
        "--grey_mean",
        action='store_true',
        help="Use grey mean RGB=127. Default is the VGG mean."
    )
    parser.add_argument(
        "--use_mean",
        action='store_true',
        help="Use mean."
    )
    parser.add_argument(
        "--adversarial",
        action='store_true',
        help="Use mean."
    )
    args = parser.parse_args()

    mkdirp(args.output_path)

    if hasattr(caffe, 'set_mode_gpu'):
        if args.gpu:
            print('GPU mode', file=sys.stderr)
            caffe.set_mode_gpu()
        net = caffe.Net(args.model_def, args.pretrained_model, caffe.TEST)
    else:
        if args.gpu:
            print('GPU mode', file=sys.stderr)
        net = caffe.Net(args.model_def, args.pretrained_model, gpu=args.gpu)


    inputs = [line.strip() for line in sys.stdin]

    print("Classifying %d inputs." % len(inputs), file=sys.stderr)


    inputBlob = net.blobs.keys()[0] # [innat]: input shape 
    outputBlob = net.blobs.keys()[-1]

    print( inputBlob, outputBlob)
    channelCount = net.blobs[inputBlob].data.shape[1]
    net.blobs[inputBlob].reshape(1, channelCount, args.tile_resolution, args.tile_resolution)
    net.reshape()

    if channelCount == 1 or channelCount > 3:
        color = 0
    else:
        color = 1

    outResolution = net.blobs[outputBlob].data.shape[2]
    inResolution = int(outResolution / args.out_scale)
    boundary = (net.blobs[inputBlob].data.shape[2] - inResolution) / 2

    for fileName in inputs:
        img = cv2.imread(fileName, flags=color).astype(np.float32)
        original = np.copy(img)
        img = img.reshape(img.shape[0], img.shape[1], -1)
        if args.use_mean:
            if args.grey_mean or channelCount == 1:
                img -= 127
            else:
                img[:,:,0] -= 103.939
                img[:,:,1] -= 116.779
                img[:,:,2] -= 123.68
        img *= 0.004

        outShape = [int(img.shape[0] * args.out_scale) ,
                    int(img.shape[1] * args.out_scale) ,
                    net.blobs[outputBlob].channels]
        imgOut = np.zeros(outShape)

        imageStartTime = time.time()
        for x, xOut in zip(range(0, img.shape[0], inResolution), range(0, imgOut.shape[0], outResolution)):
            for y, yOut in zip(range(0, img.shape[1], inResolution), range(0, imgOut.shape[1], outResolution)):

                start = time.time()

                region = getCutout(img, x, y, x+inResolution, y+inResolution, boundary)
                region = region.reshape(region.shape[0], region.shape[1], -1)
                data = region.transpose([2, 0, 1]).reshape(1, -1, region.shape[0], region.shape[1])

                if args.adversarial:
                    fillRndData(data, net)
                    out = net.forward()
                else:
                    out = net.forward_all(data=data)

                out = out[outputBlob].reshape(out[outputBlob].shape[1], out[outputBlob].shape[2], out[outputBlob].shape[3]).transpose(1, 2, 0)

                if imgOut.shape[2] == 3 or imgOut.shape[2] == 1:
                    out /= 0.004
                    if args.use_mean:
                        if args.grey_mean:
                            out += 127
                        else:
                            out[:,:,0] += 103.939
                            out[:,:,1] += 116.779
                            out[:,:,2] += 123.68

                if out.shape[0] != outResolution:
                    print("Warning: size of net output is %d px and it is expected to be %d px" % (out.shape[0], outResolution))
                if out.shape[0] < outResolution:
                    print("Error: size of net output is %d px and it is expected to be %d px" % (out.shape[0], outResolution))
                    exit()

                xRange = min((outResolution, imgOut.shape[0] - xOut))
                yRange = min((outResolution, imgOut.shape[1] - yOut))

                imgOut[xOut:xOut+xRange, yOut:yOut+yRange, :] = out[0:xRange, 0:yRange, :]
                imgOut[xOut:xOut+xRange, yOut:yOut+yRange, :] = out[0:xRange, 0:yRange, :]

                print(".", end="", file=sys.stderr)
                sys.stdout.flush()


        print(imgOut.min(), imgOut.max())
        print("IMAGE DONE %s" % (time.time() - imageStartTime))
        basename = os.path.basename(fileName)
        name = os.path.join(args.output_path, basename + args.suffix)
        print(name, imgOut.shape)
        cv2.imwrite( name, imgOut)

if __name__ == '__main__':
    main(sys.argv)

运行程序：

cat fileListToProcess.txt | python processWholeImage.py --model_def ./BMVC_nets/S14_19_200.deploy --pretrained_model ./BMVC_nets/S14_19_FQ_178000.model --output_path ./out/ --tile_resolution 300 --suffix _out.png --gpu --use_mean

权重文件和上述脚本可以从here (BMVC_net)下载。但是，您可能想要转换 caffe2pytorch。为了做到这一点，这里是基本的起点：

安装proto-lens 克隆caffemodel2pytorch

接下来，

# BMVC_net, you need to download it from authors website, link above
model = caffemodel2pytorch.Net(
    prototxt = './BMVC_net/S14_19_200.deploy', 
    weights = './BMVC_net/S14_19_FQ_178000.model',
    caffe_proto = 'https://raw.githubusercontent.com/BVLC/caffe/master/src/caffe/proto/caffe.proto'
)

model.cuda()
model.eval()
torch.set_grad_enabled(False)

在演示张量上运行，

# make sure to have right procedure of image normalization and channel reordering
image = torch.Tensor(8, 3, 98, 98).cuda()

# outputs dict of PyTorch Variables
# in this example the dict contains the only key "prob"
#output_dict = model(data = image)

# you can remove unneeded layers:
#del model.prob
#del model.fc8

# a single input variable is interpreted as an input blob named "data"
# in this example the dict contains the only key "fc7"
output_dict = model(image)
# print(output_dict)
print(output_dict.keys())

请注意，有一些基本的事情需要考虑；网络期望 DPI 120-150 的文本、合理的方向和合理的黑白级别。网络期望从输入中减去 [103.9, 116.8, 123.7]。输入应进一步乘以 0.004。

【讨论】：

您好，您在真实图片上实际测试过@Ali 分享的方法吗？首先，在原始页面上，Python 脚本是针对 Python 2.7 的，有很多要更改的地方，第二，由于此错误，我无法运行您提供的代码：“module 'caffemodel2pytorch' has no attribute 'Net '" 我对 Caffe 的实现感到迷茫。任何帮助将不胜感激。我没有进一步使用那个 Caffe 实现，它有很多限制。在 update 2 部分检查my question。就我而言，令人惊讶的是，this mechanism 也提高了文本图像的视觉质量。在我的实验期间，该机制没有发布他们的训练代码，所以如果他们现在发布了，我们可以在这个特定的文本去模糊案例上重新训练他们的模型。一个discussion。谢谢。我可能会放弃 RealSR，因为我无法在我的 2013 Macbook Air 上安装 CUDA。 MSFT 似乎是合理的。 https://github.com/ys-koshelev/nla_deblur 和 DPSR 我运气不错。

以上是关于用matlab实现图像识别的功能，提供一下思路的主要内容，如果未能解决你的问题，请参考以下文章