基于PaddleOCR的集装箱箱号检测识别

Posted 2022-12-12 风信子的猫Redamancy

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了基于PaddleOCR的集装箱箱号检测识别相关的知识，希望对你有一定的参考价值。

基于PaddleOCR的集装箱箱号检测识别

项目背景

国际航运咨询分析机构 Alphaliner 在今年 3 月公布的一组数据，2021 年集装箱吞吐量排名前 30 的榜单中，上海港以 4702.5 万标箱的「成绩单」雄踞鳌头。

较上一年同期，上海港集装箱吞吐量增长 8.1%

与最近的竞争对手新加坡拉开了近 1000 万标准箱的差距

全球百大集装箱港口，更是在 2021 年共完成集装箱吞吐量 6.76 亿 TEU。**如此大规模的集装箱数量，使得箱号识别的压力骤增，**传统的由人对集装箱号进行识别记录的方式成本高、效率低，运营条件落后。

随着经济和社会的发展，在港口经营中引入人工智能，已经成为传统港口在市场竞争中蜕变升维的关键。

于是希望从环境准备到模型训练，演示如何借助 PaddleOCR，进行集装箱箱号检测识别。

一、项目介绍：用少量数据实现箱号检测识别任务

集装箱号是指装运出口货物集装箱的箱号，填写托运单时必填此项。标准箱号的构成采用ISO6346 (1995) 标准，由 11 位编码组成，以箱号 CBHU 123456 7 为例，它包括 3 个部分：

第一部分由 4 个英文字母组成，前 3 个字母表示箱主、经营人，第 4 个字母表示集装箱类型。CBHU 表示箱主和经营人为中远集运的标准集装箱。

第二部分由 6 位数字组成，表示箱体注册码，是集装箱箱体持有的唯一标识。

第三部分为校验码，由前面 4 个字母和 6 位数字经过校验规则运算得到，用于识别在校验时是否发生错误。

堆积在港口等待运输的集装箱

这是一个基于PaddleOCR进行集装箱箱号检测识别任务，使用少量数据分别训练检测、识别模型，最后将他们串联在一起实现集装箱箱号检测识别的任务

二、环境准备

首先我们肯定是需要安装paddlepaddle的，这个paddle安装还是比较容易的，只需要以下代码，但是如果想要GPU版本，在他们的官网也是有介绍如何安装的，我这里下的是2.3的版本，因为也比较稳定，paddle安装参考 https://www.paddlepaddle.org.cn/install/quick

CPU版本

python -m pip install paddlepaddle==2.3.2 -i https://pypi.tuna.tsinghua.edu.cn/simple

GPU版本

利用conda安装

conda install paddlepaddle-gpu==2.3.2 cudatoolkit=11.6 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge

除此之外，我们是利用PaddleOCR做实验的，所以我们还需要下载下来PaddleOCR的源码，这个也是很容易的，可以在github中下载，我这里下载的是2.6的版本，可以通过git clone下载一下我们的文件

git clone https://github.com/PaddlePaddle/PaddleOCR.git

或者上github下载PaddleOCR的全部代码 https://github.com/PaddlePaddle/PaddleOCR

最后我们就进入我们的文件夹了，并且安装所有的依赖

进入PaddleOCR文件夹

cd PaddleOCR

安装PaddleOCR

!pip install -r requirements.txt #安装PaddleOCR所需依赖

安装完毕返回上层文件夹

cd ..

三、数据集介绍

本教程所使用的集装箱箱号数据集，该数据包含3003张分辨率为1920×1080的集装箱图像

1、PaddleOCR检测模型训练标注规则如下，中间用"\\t"分隔：

" 图像文件名                    json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg    ["transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...]

其中json.dumps编码前的图像标注信息是包含多个字典的list，字典中的 points 表示文本框的四个点的坐标(x, y)，从左上角的点开始顺时针排列。 transcription 表示当前文本框的文字，当其内容为“###”时，表示该文本框无效，在训练时会跳过。

2、PaddleOCR识别模型训练标注规则如下，中间用"\\t"分隔：

" 图像文件名                 图像标注信息 "

train_data/rec/train/word_001.jpg   简单可依赖
train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单

四、数据整理

4.1 检测模型所需数据准备

将数据集3000张图片按2:1划分成训练集和验证集，运行以下代码

from tqdm import tqdm
finename = "all_label.txt"
f = open(finename)
lines = f.readlines() 
t = open('det_train_label.txt','w')
v = open('det_eval_label.txt','w')
count = 0
for line in tqdm(lines):
    if count < 2000:
        t.writelines(line)
        count += 1
    else:
        v.writelines(line)
f.close()
t.close()
v.close()

100%|██████████| 3003/3003 [00:00<00:00, 56103.65it/s]

4.2 识别模型所需数据准备

我们根据检测部分的注释，裁剪数据集尽可能只包含文字部分图片作为识别的数据，运行以下代码

from PIL import Image
import json
from tqdm import tqdm
import os
import numpy as np
import cv2
import math

from PIL import Image, ImageDraw

class Rotate(object):

    def __init__(self, image: Image.Image, coordinate):
        self.image = image.convert('RGB')
        self.coordinate = coordinate
        self.xy = [tuple(self.coordinate[k]) for k in ['left_top', 'right_top', 'right_bottom', 'left_bottom']]
        self._mask = None
        self.image.putalpha(self.mask)

    @property
    def mask(self):
        if not self._mask:
            mask = Image.new('L', self.image.size, 0)
            draw = ImageDraw.Draw(mask, 'L')
            draw.polygon(self.xy, fill=255)
            self._mask = mask
        return self._mask

    def run(self):
        image = self.rotation_angle()
        box = image.getbbox()
        return image.crop(box)

    def rotation_angle(self):
        x1, y1 = self.xy[0]
        x2, y2 = self.xy[1]
        angle = self.angle([x1, y1, x2, y2], [0, 0, 10, 0]) * -1
        return self.image.rotate(angle, expand=True)

    def angle(self, v1, v2):
        dx1 = v1[2] - v1[0]
        dy1 = v1[3] - v1[1]
        dx2 = v2[2] - v2[0]
        dy2 = v2[3] - v2[1]
        angle1 = math.atan2(dy1, dx1)
        angle1 = int(angle1 * 180 / math.pi)
        angle2 = math.atan2(dy2, dx2)
        angle2 = int(angle2 * 180 / math.pi)
        if angle1 * angle2 >= 0:
            included_angle = abs(angle1 - angle2)
        else:
            included_angle = abs(angle1) + abs(angle2)
            if included_angle > 180:
                included_angle = 360 - included_angle
        return included_angle



def image_cut_save(path, bbox, save_path):
    """
    :param path: 图片路径
    :param left: 区块左上角位置的像素点离图片左边界的距离
    :param upper：区块左上角位置的像素点离图片上边界的距离
    :param right：区块右下角位置的像素点离图片左边界的距离
    :param lower：区块右下角位置的像素点离图片上边界的距离
    """
    img_width  = 1920
    img_height = 1080
    img = Image.open(path)
    coordinate = 'left_top': bbox[0], 'right_top': bbox[1], 'right_bottom': bbox[2], 'left_bottom': bbox[3]
    rotate = Rotate(img, coordinate)
    
    left, upper = bbox[0]
    right, lower = bbox[2]
    if lower-upper > right-left:
        rotate.run().convert('RGB').transpose(Image.ROTATE_90).save(save_path)
    else:
        rotate.run().convert('RGB').save(save_path)
    return True

#读取检测标注制作识别数据集
files = ["det_train_label.txt","det_eval_label.txt"]
filetypes =["train","eval"]
for index,filename in enumerate(files):
    f = open(filename)
    l = open('rec_'+filetypes[index]+'_label.txt','w')
    if index == 0:
        data_dir = "RecTrainData"
    else:
        data_dir = "RecEvalData"
    if not os.path.exists(data_dir):
        os.mkdir(data_dir)
    lines = f.readlines() 
    for line in tqdm(lines):
        image_name = line.split("\\t")[0].split("/")[-1]
        annos = json.loads(line.split("\\t")[-1])
        img_path = os.path.join("./dataset/images",image_name)
        for i,anno in enumerate(annos):
            data_path = os.path.join(data_dir,str(i)+"_"+image_name)
            if image_cut_save(img_path,anno["points"],data_path):
                l.writelines(str(i)+"_"+image_name+"\\t"+anno["transcription"]+"\\n")
    l.close()
    f.close()

  0%|          | 2/2000 [00:00<02:13, 14.98it/s]/tmp/ipykernel_250961/282371847.py:76: DeprecationWarning: ROTATE_90 is deprecated and will be removed in Pillow 10 (2023-07-01). Use Transpose.ROTATE_90 instead.
  rotate.run().convert('RGB').transpose(Image.ROTATE_90).save(save_path)
100%|██████████| 2000/2000 [01:02<00:00, 32.15it/s]
100%|██████████| 1003/1003 [00:29<00:00, 33.76it/s]

五、实验

由于数据集比较少，为了模型更好和更快的收敛，这里选用 PaddleOCR 中的 PP-OCRv3 模型进行检测和识别。PP-OCRv3在PP-OCRv2的基础上，中文场景端到端Hmean指标相比于PP-OCRv2提升5%, 英文数字模型端到端效果提升11%。详细优化细节请参考PP-OCRv3技术报告。

问你也可以看到各个模型的列表 https://github.com/PaddlePaddle/PaddleOCR/blob/v2.6.0/doc/doc_ch/models_list.md. 包括后面所有的模型都是从里面下载下来的

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1' # 选择GPU运行，比如我这里使用1号GPU运行

5.1 检测模型

5.1.1 检测模型配置

PaddleOCR提供了许多检测模型，在路径PaddleOCR/configs/det下可找到模型及其配置文件。如我们选用模型ch_PP-OCRv3_det_student.yml，其配置文件路径在：PaddleOCR/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml。使用前需对其进行必要的设置，如训练参数、数据集路径等。然后设置以下重要的参数部分，将部分关键配置展示如下：

#关键训练参数
use_gpu: true # 是否使用显卡GPU
epoch_num: 50 # 训练epoch个数

save_model_dir: ./output/ch_PP-OCR_V3_det/ #模型保存路径
save_epoch_step: 100 # 每训练100step，保存一次模型
eval_batch_step:
  - 0
  - 200 #训练每迭代400次，进行一次验证
pretrained_model: ./PaddleOCR/pretrained_model/ch_PP-OCRv3_det_distill_train/student.pdparams # 预训练模型路径
#训练集路径设置
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./dataset/images #图片文件夹路径
    label_file_list:
      - ./det_train_label.txt #标签路径
# 同时也需要设置验证集
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ./dataset/images
    label_file_list:
      - ./det_eval_label.txt
  loader:
    shuffle: true
    drop_last: false
    batch_size_per_card: 8 # 每张卡所占的batchsize，如果在训练过程中显存超限，可以把batch size调小一点

5.1.2 模型微调

在notebook中运行如下命令对模型进行微调，其中 -c 传入的为配置好的模型文件路径

!python PaddleOCR/tools/train.py -c PaddleOCR/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml

[2022/11/22 13:06:21] ppocr INFO: Architecture : 
[2022/11/22 13:06:21] ppocr INFO:     Backbone : 
[2022/11/22 13:06:21] ppocr INFO:         disable_se : True
[2022/11/22 13:06:21] ppocr INFO:         model_name : large
[2022/11/22 13:06:21] ppocr INFO:         name : MobileNetV3
[2022/11/22 13:06:21] ppocr INFO:         scale : 0.5
[2022/11/22 13:06:21] ppocr INFO:     Head : 
[2022/11/22 13:06:21] ppocr INFO:         k : 50
[2022/11/22 13:06:21] ppocr INFO:         name : DBHead
[2022/11/22 13:06:21] ppocr INFO:     Neck : 
[2022/11/22 13:06:21] ppocr INFO:         name : RSEFPN
[2022/11/22 13:06:21] ppocr INFO:         out_channels : 96
[2022/11/22 13:06:21] ppocr INFO:         shortcut : True
[2022/11/22 13:06:21] ppocr INFO:     Transform : None
[2022/11/22 13:06:21] ppocr INFO:     algorithm : DB
[2022/11/22 13:06:21] ppocr INFO:     model_type : det
[2022/11/22 13:06:21] ppocr INFO: Eval : 
[2022/11/22 13:06:21] ppocr INFO:     dataset : 
[2022/11/22 13:06:21] ppocr INFO:         data_dir : ./dataset/images
[2022/11/22 13:06:21] ppocr INFO:         label_file_list : ['./det_eval_label.txt']
[2022/11/22 13:06:21] ppocr INFO:         name : SimpleDataSet
[2022/11/22 13:06:21] ppocr INFO:         transforms : 
[2022/11/22 13:06:21] ppocr INFO:             DecodeImage : 
[2022/11/22 13:06:21] ppocr INFO:                 channel_first : False
[2022/11/22 13:06:21] ppocr INFO:                 img_mode : BGR
[2022/11/22 13:06:21] ppocr INFO:             DetLabelEncode : None
[2022/11/22 13:06:21] ppocr INFO:             DetResizeForTest : None
[2022/11/22 13:06:21] ppocr INFO:             NormalizeImage : 
[2022/11/22 13:06:21] ppocr INFO:                 mean : [0.485, 0.456, 0.406]
[2022/11/22 13:06:21] ppocr INFO:                 order : hwc
[2022/11/22 13:06:21] ppocr INFO:                 scale : 1./255.
[2022/11/22 13:06:21] ppocr INFO:                 std : [0.229, 0.224, 0.225]
[2022/11/22 13:06:21] ppocr INFO:             ToCHWImage : None
[2022/11/22 13:06:21] ppocr INFO:             KeepKeys : 
[2022/11/22 13:06:21] ppocr INFO:                 keep_keys : ['image', 'shape', 'polys', 'ignore_tags']
[2022/11/22 13:06:21] ppocr INFO:     loader : 
[2022/11/22 13:06:21] ppocr INFO:         batch_size_per_card : 1
[2022/11/22 13:06:21] ppocr INFO:         drop_last : False
[2022/11/22 13:06:21] ppocr INFO:         num_workers : 2
[2022/11/22 13:06:21] ppocr INFO:         shuffle : False
[2022/11/22 13:06:21] ppocr INFO: Global : 
[2022/11/22 13:06:21] ppocr INFO:     cal_metric_during_train : False
[2022/11/22 13:06:21] ppocr INFO:     checkpoints : None
[2022/11/22 13:06:21] ppocr INFO:     debug : False
[2022/11/22 13:06:21] ppocr INFO:     distributed : False
[2022/11/22 13:06:21] ppocr INFO:     epoch_num : 50
[2022/11/22 13:06:21] ppocr INFO:     eval_batch_step : [0, 200]
[2022/11/22 13:06:21] ppocr INFO:     infer_img : doc/imgs_en/img_10.jpg
[2022/11/22 13:06:21] ppocr INFO:     log_smooth_window : 20
[2022/11/22 13:06:21] ppocr INFO:     pretrained_model : ./PaddleOCR/pretrained_model/ch_PP-OCRv3_det_distill_train/student.pdparams
[2022/11/22 13:06:21] ppocr INFO:     print_batch_step : 10
[2022/11/22 13:06:21] ppocr INFO:     save_epoch_step : 100
[2022/11/22 13:06:21] ppocr INFO:     save_inference_dir : None
[2022/11/22 13:06:21] ppocr INFO:     save_model_dir : ./output/ch_PP-OCR_V3_det/
[2022/11/22 13:06:21] ppocr INFO:     save_res_path : ./checkpoints/det_db/predicts_db.txt
[2022/11/22 13:06:21] ppocr INFO:     use_gpu : True
[2022/11/22 13:06:21] ppocr INFO:     use_visualdl : False
[2022/11/22 13:06:21] ppocr INFO: Loss : 
[2022/11/22 13:06:21] ppocr INFO:     alpha : 5
[2022/11/22 13:06:21] ppocr INFO:     balance_loss : True
[2022/11/22 13:06:21] ppocr INFO:     beta : 10
[2022/11/22 13:06:21] ppocr INFO:     main_loss_type : DiceLoss
[2022/11/22 13:06:21] ppocr INFO:     name : DBLoss
[2022/11/22 13:06:21] ppocr INFO:     ohem_ratio : 3
[2022/11/22 13:06:21] ppocr INFO: Metric : 
[2022/11/22 13:06:21] ppocr INFO:     main_indicator : hmean
[2022/11/22 13:06:21] ppocr INFO:     name : DetMetric
[2022/11/22 13:06:21] ppocr INFO: Optimizer : 
[2022/11/22 13:06:21] ppocr INFO:     beta1 : 0.9
[2022/11/22 13:06:21] ppocr INFO:     beta2 : 0.999
[2022/11/22 13:06:21] ppocr INFO:     lr : 
[2022/11/22 13:06:21] ppocr INFO:         learning_rate : 0.001
[2022/11/22 13:06:21] ppocr INFO:         name : Cosine
[2022/11/22 13:06:21] ppocr INFO:         warmup_epoch : 2
[2022/11/22 13:06:21] ppocr INFO:     name : Adam
[2022/11/22 13:06:21] ppocr INFO:     regularizer : 
[2022/11/22 13:06:21] ppocr INFO:         factor : 5e-05
[2022/11/22 13:06:21] ppocr INFO:         name : L2
[2022/11/22 13:06:21] ppocr INFO: PostProcess : 
[2022/11/22 13:06:21] ppocr INFO:     box_thresh : 0.6
[2022/11/22 13:06:21] ppocr INFO:     max_candidates : 1000
[2022/11/22 13:06:21] ppocr INFO:     name : DBPostProcess
[2022/11/22 13:06:21] ppocr INFO:     thresh : 0.3
[2022/11/22 13:06:21] ppocr INFO:     unclip_ratio : 1.5
[2022/11/22 13:06:21] ppocr INFO: Train : 
[2022/11/22 13:06:21] ppocr INFO:     dataset : 
[2022/11/22 13:06:21] ppocr INFO:         data_dir : ./dataset/images
[2022/11/22 13:06:21] ppocr INFO:         label_file_list : ['./det_train_label.txt']
[2022/11/22 13:06:21] ppocr INFO:         name : SimpleDataSet
[2022/11/22 13:06:21] ppocr INFO:         ratio_list : [1.0]
[2022/11/22 13:06:21] ppocr INFO:         transforms : 
[2022/11/22 13:06:21] ppocr INFO:             DecodeImage : 
[2022/11/22 13:06:21] ppocr INFO:                 channel_first : False
[2022/11/22 13:06:21] ppocr INFO:                 img_mode : BGR
[2022/11/22 13:06:21] ppocr INFO:             DetLabelEncode : None
[2022/11/22 13:06:21] ppocr INFO:             IaaAugment : 
[2022/11/22 13:06:21] ppocr INFO:                 augmenter_args : 
[2022/11/22 13:06:21] ppocr INFO:                     args : 
[2022/11/22 13:06:21] ppocr INFO:                         p : 0.5
[2022/11/22 13:06:21] ppocr INFO:                     type : Fliplr
[2022/11/22 13:06:21] ppocr INFO:                     args : 
[2022/11/22 13:06:21] ppocr INFO:                         rotate : [-10, 10]
[2022/11/22 13:06:21] ppocr INFO:                     type : Affine
[2022/11/22 13:06:21] ppocr INFO:                     args : 
[2022/11/22 13:06:21] ppocr INFO:                         size : [0.5, 3]
[2022/11/22 13:06:21] ppocr INFO:                     type : Resize
[2022/11/22 13:06:21] ppocr INFO:             EastRandomCropData : 
[2022/11/22 13:06:21] ppocr INFO:                 keep_ratio : True
[2022/11/22 13:06:21] ppocr INFO:                 max_tries : 50
[2022/11/22 13:06:21] ppocr INFO:                 size : [960, 960]
[2022/11/22 13:06:21] ppocr INFO:             MakeBorderMap : 
[2022/11/22 13:06:21] ppocr INFO:                 shrink_ratio : 0.4
[2022/11/22 13:06:21] ppocr INFO:                 thresh_max : 0.7
[2022/11/22 13:06:21] ppocr INFO:                 thresh_min : 0.3
[2022/11/22 13:06:21] ppocr INFO:             MakeShrinkMap : 
[2022/11/22 13:06:21] ppocr INFO:                 min_text_size : 8
[2022/11/22 13:06:21] ppocr INFO:                 shrink_ratio : 0.4
[2022/11/22 13:06:21] ppocr INFO:             NormalizeImage : 
[2022/11/22 13:06:21] ppocr INFO:                 mean : [0.485, 0.456, 0.406]
[2022/11/22 13:06:21] ppocr INFO:                 order : hwc
[2022/11/22 13:06:21] ppocr INFO:                 scale : 1./255.
[2022/11/22 13:06:21] ppocr INFO:                 std : [0.229, 0.224, 0.225]
[2022/11/22 13:06:21] ppocr INFO:             ToCHWImage : None
[2022/11/22 13:06:21] ppocr INFO:             KeepKeys : 
[2022/11/22 13:06:21] ppocr INFO:                 keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask']
[2022/11/22 13:06:21] ppocr INFO:     loader : 
[2022/11/22 13:06:21] ppocr INFO:         batch_size_per_card : 8
[2022/11/22 13:06:21] ppocr INFO:         drop_last : False
[2022/11/22 13:06:21] ppocr INFO:         num_workers : 4
[2022/11/22 13:06:21] ppocr INFO:         shuffle : True
[2022/11/22 13:06:21] ppocr INFO: profiler_options : None
[2022/11/22 13:06:21] ppocr INFO: train with paddle 2.3.2 and device Place(gpu:0)
[2022/11/22 13:06:21] ppocr INFO: Initialize indexs of datasets:['./det_train_label.txt']
[2022/11/22 13:06:21] ppocr INFO: Initialize indexs of datasets:['./det_eval_label.txt']
W1122 13:06:21.615907 1637263 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.8, Runtime API Version: 11.6
W1122 13:06:21.621130 1637263 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
[2022/11/22 13:06:25] ppocr INFO: train dataloader has 250 iters
[2022/11/22 13:06:25] ppocr INFO: valid dataloader has 1003 iters
[2022/11/22 13:06:28] ppocr INFO: load pretrain successful from ./PaddleOCR/pretrained_model/ch_PP-OCRv3_det_distill_train/student
[2022/11/22 13:06:28] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 200 iterations
[2022/11/22 13:06:41] ppocr INFO: epoch: [1/50], global_step: 10, lr: 0.000009, loss: 3.958436, loss_shrink_maps: 2.519069, loss_threshold_maps: 0.946237, loss_binary_maps: 0.504128, avg_reader_cost: 0.24535 s, avg_batch_cost: 1.18631 s, avg_samples: 8.0, ips: 6.74361 samples/s, eta: 4:06:56
[2022/11/22 13:06:45] ppocr INFO: epoch: [1/50], global_step: 20, lr: 0.000019, loss: 3.866610, loss_shrink_maps: 2.481548, loss_threshold_maps: 0.944892, loss_binary_maps: 0.496426, avg_reader_cost: 0.00237 s, avg_batch_cost: 0.36309 s, avg_samples: 8.0, ips: 22.03292 samples/s, eta: 2:41:08
[2022/11/22 13:06:49] ppocr INFO: epoch: [1/50], global_step: 30, lr: 0.000039, loss: 3.867103, loss_shrink_maps: 2.555314, loss_threshold_maps: 0.927763, loss_binary_maps: 0.511137, avg_reader_cost: 0.00391 s, avg_batch_cost: 0.36944 s, avg_samples: 8.0, ips: 21.65439 samples/s, eta: 2:12:55
[2022/11/22 13:06:53] ppocr INFO: epoch: [1/50], global_step: 40, lr: 0.000059, loss: 3.878752, loss_shrink_maps: 2.555314, loss_threshold_maps: 0.883135, loss_binary_maps: 0.511137, avg_reader_cost: 0.00372 s, avg_batch_cost: 0.36812 s, avg_samples: 8.0, ips: 21.73230 samples/s, eta: 1:58:43
[2022/11/22 13:06:58] ppocr INFO: epoch: [1/50], global_step: 50, lr: 0.000079, loss: 3.711426, loss_shrink_maps: 2.362094, loss_threshold_maps: 0.808442, loss_binary_maps: 0.472508, avg_reader_cost: 0.00139 s, avg_batch_cost: 0.36452 s, avg_samples: 8.0, ips: 21.94658 samples/s, eta: 1:50:02
[2022/11/22 13:07:02] ppocr INFO: epoch: [1/50], global_step: 60, lr: 0.000099, loss: 3.374191, loss_shrink_maps: 2.211356, loss_threshold_maps: 0.763958, loss_binary_maps: 0.442548, avg_reader_cost: 0.00234 s, avg_batch_cost: 0.36611 s, avg_samples: 8.0, ips: 21.85163 samples/s, eta: 1:44:16
[2022/11/22 13:07:06] ppocr INFO: epoch: [1/50], global_step: 70, lr: 0.000119, loss: 3.223969, loss_shrink_maps: 2.070736, loss_threshold_maps: 0.746714, loss_binary_maps: 0.414206, avg_reader_cost: 0.00026 s, avg_batch_cost: 0.36592 s, avg_samples: 8.0, ips: 21.86294 samples/s, eta: 1:40:08
[2022/11/22 13:07:10] ppocr INFO: epoch: [1/50], global_step: 80, lr: 0.000139, loss: 2.865101, loss_shrink_maps: 1.764307, loss_threshold_maps: 0.722126, loss_binary_maps: 0.353773, avg_reader_cost: 0.00557 s, avg_batch_cost: 0.37052 s, avg_samples: 8.0, ips: 21.59116 samples/s, eta: 1:37:08
[2022/11/22 13:07:15] ppocr INFO: epoch: [1/50], global_step: 90, lr: 0.000159, loss: 2.916367, loss_shrink_maps: 1.785145, loss_threshold_maps: 0.736615, loss_binary_maps: 0.357866, avg_reader_cost: 0.00193 s, avg_batch_cost: 0.36543 s, avg_samples: 8.0, ips: 21.89180 samples/s, eta: 1:34:40
[2022/11/22 13:07:19] ppocr INFO: epoch: [1/50], global_step: 100, lr: 0.000179, loss: 2.895378, loss_shrink_maps: 1.764575, loss_threshold_maps: 0.750305, loss_binary_maps: 0.352447, avg_reader_cost: 0.00201 s, avg_batch_cost: 0.36573 s, avg_samples: 8.0, ips: 21.87380 samples/s, eta: 1:32:41
[2022/11/22 13:07:23] ppocr INFO: epoch: [1/50], global_step: 110, lr: 0.000199, loss: 2.629834, loss_shrink_maps: 1.587836, loss_threshold_maps: 0.753817, loss_binary_maps: 0.316871, avg_reader_cost: 0.00125 s, avg_batch_cost: 0.36450 s, avg_samples: 8.0, ips: 21.94768 samples/s, eta: 1:31:02
[2022/11/22 13:07:27] ppocr INFO: epoch: [1/50], global_step: 120, lr: 0.000219, loss: 2.397547, loss_shrink_maps: 1.460247, loss_threshold_maps: 0.667723, loss_binary_maps: 0.291848, avg_reader_cost: 0.00466 s, avg_batch_cost: 0.37048 s, avg_samples: 8.0, ips: 21.59387 samples/s, eta: 1:29:45
[2022/11/22 13:07:32] ppocr INFO: epoch: [1/50], global_step: 130, lr: 0.000239, loss: 2.378679, loss_shrink_maps: 1.401813, loss_threshold_maps: 0.673884, loss_binary_maps: 0.280701, avg_reader_cost: 0.00088 s, avg_batch_cost: 0.36518 s, avg_samples: 8.0, ips: 21.90695 samples/s, eta: 1:28:34
[2022/11/22 13:07:36] ppocr INFO: epoch: [1/50], global_step: 140, lr: 0.000259, loss: 2.451726, loss_shrink_maps: 1.482388, loss_threshold_maps: 0.681260, loss_binary_maps: 0.296235, avg_reader_cost: 0.00300 s, avg_batch_cost: 0.38128 s, avg_samples: 8.0, ips: 20.98186 samples/s, eta: 1:27:47
[2022/11/22 13:07:41] ppocr INFO: epoch: [1/50], global_step: 150, lr: 0.000279, loss: 2.589176, loss_shrink_maps: 1.562890, loss_threshold_maps: 0.715202, loss_binary_maps: 0.312893, avg_reader_cost: 0.00400 s, avg_batch_cost: 0.37979 s, avg_samples: 8.0, ips: 21.06412 samples/s, eta: 1:27:05
[2022/11/22 13:07:45] ppocr INFO: epoch: [1/50], global_step: 160, lr: 0.000299, loss: 2.706166, loss_shrink_maps: 1.639270, loss_threshold_maps: 0.734711, loss_binary_maps: 0.328181, avg_reader_cost: 0.00109 s, avg_batch_cost: 0.37755 s, avg_samples: 8.0, ips: 21.18906 samples/s, eta: 1:26:25
[2022/11/22 13:07:49] ppocr INFO: epoch: [1/50], global_step: 170, lr: 0.000319, loss: 2.643976, loss_shrink_maps: 1.618946, loss_threshold_maps: 0.707388, loss_binary_maps: 0.324081, avg_reader_cost: 0.00578 s, avg_batch_cost: 0.38069 s, avg_samples: 8.0, ips: 21.01471 samples/s, eta: 1:25:52
[2022/11/22 13:07:54] ppocr INFO: epoch: [1/50], global_step: 180, lr: 0.000339, loss: 2.542865, loss_shrink_maps: 1.494720, loss_threshold_maps: 0.716266, loss_binary_maps: 0.299067, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.37948 s, avg_samples: 8.0, ips: 21.08149 samples/s, eta: 1:25:22
[2022/11/22 13:07:58] ppocr INFO: epoch: [1/50], global_step: 190, lr: 0.000359, loss: 2.484875, loss_shrink_maps: 1.468433, loss_threshold_maps: 0.721491, loss_binary_maps: 0.293729, avg_reader_cost: 0.00022 s, avg_batch_cost: 0.36306 s, avg_samples: 8.0, ips: 22.03508 samples/s, eta: 1:24:44
[2022/11/22 13:08:02] ppocr INFO: epoch: [1/50], global_step: 200, lr: 0.000379, loss: 2.391915, loss_shrink_maps: 1.404688, loss_threshold_maps: 0.692680, loss_binary_maps: 0.281057, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.37660 s, avg_samples: 8.0, ips: 21.24274 samples/s, eta: 1:24:17
eval model:: 100%|██████████████████████████| 1003/1003 [00:55<00:00, 18.21it/s]
[2022/11/22 13:08:57] ppocr INFO: cur metric, precision: 0.766156462585034, recall: 0.9745808545159546, hmean: 0.8578909783384908, fps: 24.47464613318005
[2022/11/22 13:08:57] ppocr INFO: save best model is to ./output/ch_PP-OCR_V3_det/best_accuracy
[2022/11/22 13:08:57] ppocr INFO: best metric, hmean: 0.8578909783384908, is_float16: False, precision: 0.766156462585034, recall: 0.9745808545159546, fps: 24.47464613318005, best_epoch: 1
......
eval model:: 100%|██████████████████████████| 1003/1003 [01:00<00:00, 16.46it/s]
[2022/11/22 15:41:39] ppocr INFO: cur metric, precision: 0.9622942113648434, recall: 0.9799891833423472, hmean: 0.9710610932475884, fps: 22.98522684952537
[2022/11/22 15:41:39] ppocr INFO: best metric, hmean: 0.9742212674543502, is_float16: False, precision: 0.9674666666666667, recall: 0.9810708491076258, fps: 23.192479978844272, best_epoch: 31
[2022/11/22 15:41:43] ppocr INFO: epoch: [50/50], global_step: 12410, lr: 0.000006, loss: 1.340954, loss_shrink_maps: 0.700807, loss_threshold_maps: 0.478234, loss_binary_maps: 0.139876, avg_reader_cost: 0.00200 s, avg_batch_cost: 0.37641 s, avg_samples: 8.0, ips: 21.25350 samples/s, eta: 0:00:34
[2022/11/22 15:41:48] ppocr INFO: epoch: [50/50], global_step: 12420, lr: 0.000005, loss: 1.443306, loss_shrink_maps: 0.745190, loss_threshold_maps: 0.499261, loss_binary_maps: 0.149030, avg_reader_cost: 0.00094 s, avg_batch_cost: 0.39211 s, avg_samples: 8.0, ips: 20.40230 samples/s, eta: 0:00:30
[2022/11/22 15:41:52] ppocr INFO: epoch: [50/50], global_step: 12430, lr: 0.000005, loss: 1.321360, loss_shrink_maps: 0.683633, loss_threshold_maps: 0.489775, loss_binary_maps: 0.136621, avg_reader_cost: 0.00189 s, avg_batch_cost: 0.38061 s, avg_samples: 8.0, ips: 21.01903 samples/s, eta: 0:00:27
[2022/11/22 15:41:56] ppocr INFO: epoch: [50/50], global_step: 12440, lr: 0.000005, loss: 1.238384, loss_shrink_maps: 0.635735, loss_threshold_maps: 0.448261, loss_binary_maps: 0.126847, avg_reader_cost: 0.00201 s, avg_batch_cost: 0.37915 s, avg_samples: 8.0, ips: 21.09974 samples/s, eta: 0:00:23
[2022/11/22 15:42:01] ppocr INFO: epoch: [50/50], global_step: 12450, lr: 0.000005, loss: 1.191645, loss_shrink_maps: 0.622861, loss_threshold_maps: 0.437820, loss_binary_maps: 0.124672, avg_reader_cost: 0.00017 s, avg_batch_cost: 0.37562 s, avg_samples: 8.0, ips: 21.29834 samples/s, eta: 0:00:19
[2022/11/22 15:42:05] ppocr INFO: epoch: [50/50], global_step: 12460, lr: 0.000005, loss: 1.180676, loss_shrink_maps: 0.629529, loss_threshold_maps: 0.427876, loss_binary_maps: 0.126142, avg_reader_cost: 0.00199 s, avg_batch_cost: 0.37954 s, avg_samples: 8.0, ips: 21.07820 samples/s, eta: 0:00:15
[2022/11/22 15:42:10] ppocr INFO: epoch: [50/50], global_step: 12470, lr: 0.000005, loss: 1.238333, loss_shrink_maps: 0.670731, loss_threshold_maps: 0.432300, loss_binary_maps: 0.134142, avg_reader_cost: 0.00199 s, avg_batch_cost: 0.37980 s, avg_samples: 8.0, ips: 21.06347 samples/s, eta: 0:00:11
[2022/11/22 15:42:14] ppocr INFO: epoch: [50/50], global_step: 12480, lr: 0.000004, loss: 1.254117, loss_shrink_maps: 0.662911, loss_threshold_maps: 0.454790, loss_binary_maps: 0.132727, avg_reader_cost: 0.00181 s, avg_batch_cost: 0.37586 s, avg_samples: 8.0, ips: 21.28430 samples/s, eta: 0:00:07
[2022/11/22 15:42:18] ppocr INFO: epoch: [50/50], global_step: 12490, lr: 0.000004, loss: 1.324386, loss_shrink_maps: 0.701260, loss_threshold_maps: 0.501584, loss_binary_maps: 0.140036, avg_reader_cost: 0.00019 s, avg_batch_cost: 0.37545 s, avg_samples: 8.0, ips: 21.30779 samples/s, eta: 0:00:03
[2022/11/22 15:42:23] ppocr INFO: epoch: [50/50], global_step: 12500, lr: 0.000004, loss: 1.407378, loss_shrink_maps: 0.767860, loss_threshold_maps: 0.488598, loss_binary_maps: 0.153395, avg_reader_cost: 0.00346 s, avg_batch_cost: 0.37688 s, avg_samples: 8.0, ips: 21.22694 samples/s, eta: 0:00:00
[2022/11/22 15:42:23] ppocr INFO: save model in ./output/ch_PP-OCR_V3_det/latest
[2022/11/22 15:42:23] ppocr INFO: best metric, hmean: 0.9742212674543502, is_float16: False, precision: 0.9674666666666667, recall: 0.9810708491076258, fps: 23.192479978844272, best_epoch: 31

修改了默认超参数，进行训练，模型ch_PP-OCRv3_det_student在训练集上训练50个epoch后，模型在验证集上的hmean达到：97.4%，在后面的epochs无明显增长

如果在训练过程中显存超限，可以把batch size调小一点

5.2 识别模型

5.2.1 识别模型配置

PaddleOCR也提供了许多识别模型，在路径PaddleOCR-r/configs/rec下可找到模型及其配置文件。如我们选用模型ch_PP-OCRv3_rec_distillation，其配置文件路径在：PaddleOCR/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml。使用前需对其进行必要的设置，如训练参数、数据集路径等。同样也要下载下来预训练权重模型，将部分关键配置展示如下：

#关键训练参数
use_gpu: true #是否使用显卡
epoch_num: 100 #训练epoch个数
save_model_dir: ./output/rec_ppocr_v3_distillation #模型保存路径
save_epoch_step: 100 #每训练100step，保存一次模型
eval_batch_step: [0, 100] #训练每迭代100次，进行一次验证
pretrained_model: ./PaddleOCR/pretrain_modeled/ch_PP-OCRv3_rec_train/best_accuracy.pdparams #预训练模型路径
#训练集路径设置
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ./RecTrainData/ #图片文件夹路径
    label_file_list:
      - ./rec_train_label.txt #标签路径

5.2.2 模型微调

在notebook中运行如下命令对模型进行微调，其中 -c 传入的为配置好的模型文件路径

!python PaddleOCR/tools/train.py \\
    -c PaddleOCR/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml

[2022/11/22 15:42:30] ppocr INFO: Architecture : 
[2022/11/22 15:42:30] ppocr INFO:     Models : 
[2022/11/22 15:42:30] ppocr INFO:         Student : 
[2022/11/22 15:42:30] ppocr INFO:             Backbone : 
[2022/11/22 15:42:30] ppocr INFO:                 last_conv_stride : [1, 2]
[2022/11/22 15:42:30] ppocr INFO:                 last_pool_type : avg
[2022/11/22 15:42:30] ppocr INFO:                 name : MobileNetV1Enhance
[2022/11/22 15:42:30] ppocr INFO:                 scale : 0.5
[2022/11/22 15:42:30] ppocr INFO:             Head : 
[2022/11/22 15:42:30] ppocr INFO:                 head_list : 
[2022/11/22 15:42:30] ppocr INFO:                     CTCHead : 
[2022/11/22 15:42:30] ppocr INFO:                         Head : 
[2022/11/22 15:42:30] ppocr INFO:                             fc_decay : 1e-05
[2022/11/22 15:42:30] ppocr INFO:                         Neck : 
[2022/11/22 15:42:30] ppocr INFO:                             depth : 2
[2022/11/22 15:42:30] ppocr INFO:                             dims : 64
[2022/11/22 15:42:30] ppocr INFO:                             hidden_dims : 120
[2022/11/22 15:42:30] ppocr INFO:                             name : svtr
[2022/11/22 15:42:30] ppocr INFO:                             use_guide : True
[2022/11/22 15:42:30] ppocr INFO:                     SARHead : 
[2022/11/22 15:42:30] ppocr INFO:                         enc_dim : 512
[2022/11/22 15:42:30] ppocr INFO:                         max_text_length : 25
[2022/11/22 15:42:30] ppocr INFO:                 name : MultiHead
[2022/11/22 15:42:30] ppocr INFO:             Transform : None
[2022/11/22 15:42:30] ppocr INFO:             algorithm : SVTR
[2022/11/22 15:42:30] ppocr INFO:             freeze_params : False
[2022/11/22 15:42:30] ppocr INFO:             model_type : rec
[2022/11/22 15:42:30] ppocr INFO:             pretrained : None
[2022/11/22 15:42:30] ppocr INFO:             return_all_feats : True
[2022/11/22 15:42:30] ppocr INFO:         Teacher : 
[2022/11/22 15:42:30] ppocr INFO:             Backbone : 
[2022/11/22 15:42:30] ppocr INFO:                 last_conv_stride : [1, 2]
[2022/11/22 15:42:30] ppocr INFO:                 last_pool_type : avg
[2022/11/22 15:42:30] ppocr INFO:                 name : MobileNetV1Enhance
[2022/11/22 15:42:30] ppocr INFO:                 scale : 0.5
[2022/11/22 15:42:30] ppocr INFO:             Head : 
[2022/11/22 15:42:30] ppocr INFO:                 head_list : 
[2022/11/22 15:42:30] ppocr INFO:                     CTCHead : 
[2022/11/22 15:42:30] ppocr INFO:                         Head : 
[2022/11/22 15:42:30] ppocr INFO:                             fc_decay : 1e-05
[2022/11/22 15:42:30] ppocr INFO:                         Neck : 
[2022/11/22 15:42:30] ppocr INFO:                             depth : 2
[2022/11/22 15:42:30] ppocr INFO:                             dims : 64
[2022/11/22 15:42:30] ppocr INFO:                             hidden_dims : 120
[2022/11/22 15:42:30] ppocr INFO:                             name : svtr
[2022/11/22 15:42:30] ppocr INFO:                             use_guide : True
[2022/11/22 15:42:30] ppocr INFO:                     SARHead : 
[2022/11/22 15:42:30] ppocr INFO:                         enc_dim : 512
[2022/11/22 15:42:30] ppocr INFO:                         max_text_length : 25
[2022/11/22 15:42:30] ppocr INFO:                 name : MultiHead
[2022/11/22 15:42:30] ppocr INFO:             Transform : None
[2022/11/22 15:42:30] ppocr INFO:             algorithm : SVTR
[2022/11/22 15:42:30] ppocr INFO:             freeze_params : False
[2022/11/22 15:42:30] ppocr INFO:             model_type : rec
[2022/11/22以上是关于基于PaddleOCR的集装箱箱号检测识别的主要内容，如果未能解决你的问题，请参考以下文章