基于PaddleOCR的集装箱箱号检测识别
Posted 风信子的猫Redamancy
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了基于PaddleOCR的集装箱箱号检测识别相关的知识,希望对你有一定的参考价值。
基于PaddleOCR的集装箱箱号检测识别
项目背景
国际航运咨询分析机构 Alphaliner 在今年 3 月公布的一组数据,2021 年集装箱吞吐量排名前 30 的榜单中,上海港以 4702.5 万标箱的「成绩单」雄踞鳌头。
较上一年同期,上海港集装箱吞吐量增长 8.1%
与最近的竞争对手新加坡拉开了近 1000 万标准箱的差距
全球百大集装箱港口,更是在 2021 年共完成集装箱吞吐量 6.76 亿 TEU。**如此大规模的集装箱数量,使得箱号识别的压力骤增,**传统的由人对集装箱号进行识别记录的方式成本高、效率低,运营条件落后。
随着经济和社会的发展,在港口经营中引入人工智能,已经成为传统港口在市场竞争中蜕变升维的关键。
于是希望从环境准备到模型训练,演示如何借助 PaddleOCR,进行集装箱箱号检测识别。
一、项目介绍:用少量数据实现箱号检测识别任务
集装箱号是指装运出口货物集装箱的箱号,填写托运单时必填此项。标准箱号的构成采用ISO6346 (1995) 标准,由 11 位编码组成,以箱号 CBHU 123456 7 为例,它包括 3 个部分:
第一部分由 4 个英文字母组成,前 3 个字母表示箱主、经营人,第 4 个字母表示集装箱类型。CBHU 表示箱主和经营人为中远集运的标准集装箱。
第二部分由 6 位数字组成,表示箱体注册码,是集装箱箱体持有的唯一标识。
第三部分为校验码,由前面 4 个字母和 6 位数字经过校验规则运算得到,用于识别在校验时是否发生错误。
这是一个基于PaddleOCR进行集装箱箱号检测识别任务,使用少量数据分别训练检测、识别模型,最后将他们串联在一起实现集装箱箱号检测识别的任务
二、环境准备
首先我们肯定是需要安装paddlepaddle的,这个paddle安装还是比较容易的,只需要以下代码,但是如果想要GPU版本,在他们的官网也是有介绍如何安装的,我这里下的是2.3的版本,因为也比较稳定,paddle安装参考 https://www.paddlepaddle.org.cn/install/quick
CPU版本
python -m pip install paddlepaddle==2.3.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
GPU版本
利用conda安装
conda install paddlepaddle-gpu==2.3.2 cudatoolkit=11.6 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
除此之外,我们是利用PaddleOCR做实验的,所以我们还需要下载下来PaddleOCR的源码,这个也是很容易的,可以在github中下载,我这里下载的是2.6的版本,可以通过git clone下载一下我们的文件
git clone https://github.com/PaddlePaddle/PaddleOCR.git
或者上github下载PaddleOCR的全部代码 https://github.com/PaddlePaddle/PaddleOCR
最后我们就进入我们的文件夹了,并且安装所有的依赖
- 进入PaddleOCR文件夹
cd PaddleOCR
- 安装PaddleOCR
!pip install -r requirements.txt #安装PaddleOCR所需依赖
- 安装完毕返回上层文件夹
cd ..
三、数据集介绍
本教程所使用的集装箱箱号数据集,该数据包含3003张分辨率为1920×1080的集装箱图像
1、PaddleOCR检测模型训练标注规则如下,中间用"\\t"分隔:
" 图像文件名 json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg ["transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...]
其中json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 points 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 transcription 表示当前文本框的文字,当其内容为“###”时,表示该文本框无效,在训练时会跳过。
2、PaddleOCR识别模型训练标注规则如下,中间用"\\t"分隔:
" 图像文件名 图像标注信息 "
train_data/rec/train/word_001.jpg 简单可依赖
train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
四、数据整理
4.1 检测模型所需数据准备
将数据集3000张图片按2:1划分成训练集和验证集,运行以下代码
from tqdm import tqdm
finename = "all_label.txt"
f = open(finename)
lines = f.readlines()
t = open('det_train_label.txt','w')
v = open('det_eval_label.txt','w')
count = 0
for line in tqdm(lines):
if count < 2000:
t.writelines(line)
count += 1
else:
v.writelines(line)
f.close()
t.close()
v.close()
100%|██████████| 3003/3003 [00:00<00:00, 56103.65it/s]
4.2 识别模型所需数据准备
我们根据检测部分的注释,裁剪数据集尽可能只包含文字部分图片作为识别的数据,运行以下代码
from PIL import Image
import json
from tqdm import tqdm
import os
import numpy as np
import cv2
import math
from PIL import Image, ImageDraw
class Rotate(object):
def __init__(self, image: Image.Image, coordinate):
self.image = image.convert('RGB')
self.coordinate = coordinate
self.xy = [tuple(self.coordinate[k]) for k in ['left_top', 'right_top', 'right_bottom', 'left_bottom']]
self._mask = None
self.image.putalpha(self.mask)
@property
def mask(self):
if not self._mask:
mask = Image.new('L', self.image.size, 0)
draw = ImageDraw.Draw(mask, 'L')
draw.polygon(self.xy, fill=255)
self._mask = mask
return self._mask
def run(self):
image = self.rotation_angle()
box = image.getbbox()
return image.crop(box)
def rotation_angle(self):
x1, y1 = self.xy[0]
x2, y2 = self.xy[1]
angle = self.angle([x1, y1, x2, y2], [0, 0, 10, 0]) * -1
return self.image.rotate(angle, expand=True)
def angle(self, v1, v2):
dx1 = v1[2] - v1[0]
dy1 = v1[3] - v1[1]
dx2 = v2[2] - v2[0]
dy2 = v2[3] - v2[1]
angle1 = math.atan2(dy1, dx1)
angle1 = int(angle1 * 180 / math.pi)
angle2 = math.atan2(dy2, dx2)
angle2 = int(angle2 * 180 / math.pi)
if angle1 * angle2 >= 0:
included_angle = abs(angle1 - angle2)
else:
included_angle = abs(angle1) + abs(angle2)
if included_angle > 180:
included_angle = 360 - included_angle
return included_angle
def image_cut_save(path, bbox, save_path):
"""
:param path: 图片路径
:param left: 区块左上角位置的像素点离图片左边界的距离
:param upper:区块左上角位置的像素点离图片上边界的距离
:param right:区块右下角位置的像素点离图片左边界的距离
:param lower:区块右下角位置的像素点离图片上边界的距离
"""
img_width = 1920
img_height = 1080
img = Image.open(path)
coordinate = 'left_top': bbox[0], 'right_top': bbox[1], 'right_bottom': bbox[2], 'left_bottom': bbox[3]
rotate = Rotate(img, coordinate)
left, upper = bbox[0]
right, lower = bbox[2]
if lower-upper > right-left:
rotate.run().convert('RGB').transpose(Image.ROTATE_90).save(save_path)
else:
rotate.run().convert('RGB').save(save_path)
return True
#读取检测标注制作识别数据集
files = ["det_train_label.txt","det_eval_label.txt"]
filetypes =["train","eval"]
for index,filename in enumerate(files):
f = open(filename)
l = open('rec_'+filetypes[index]+'_label.txt','w')
if index == 0:
data_dir = "RecTrainData"
else:
data_dir = "RecEvalData"
if not os.path.exists(data_dir):
os.mkdir(data_dir)
lines = f.readlines()
for line in tqdm(lines):
image_name = line.split("\\t")[0].split("/")[-1]
annos = json.loads(line.split("\\t")[-1])
img_path = os.path.join("./dataset/images",image_name)
for i,anno in enumerate(annos):
data_path = os.path.join(data_dir,str(i)+"_"+image_name)
if image_cut_save(img_path,anno["points"],data_path):
l.writelines(str(i)+"_"+image_name+"\\t"+anno["transcription"]+"\\n")
l.close()
f.close()
0%| | 2/2000 [00:00<02:13, 14.98it/s]/tmp/ipykernel_250961/282371847.py:76: DeprecationWarning: ROTATE_90 is deprecated and will be removed in Pillow 10 (2023-07-01). Use Transpose.ROTATE_90 instead.
rotate.run().convert('RGB').transpose(Image.ROTATE_90).save(save_path)
100%|██████████| 2000/2000 [01:02<00:00, 32.15it/s]
100%|██████████| 1003/1003 [00:29<00:00, 33.76it/s]
五、实验
由于数据集比较少,为了模型更好和更快的收敛,这里选用 PaddleOCR 中的 PP-OCRv3 模型进行检测和识别。PP-OCRv3在PP-OCRv2的基础上,中文场景端到端Hmean指标相比于PP-OCRv2提升5%, 英文数字模型端到端效果提升11%。详细优化细节请参考PP-OCRv3技术报告。
问你也可以看到各个模型的列表 https://github.com/PaddlePaddle/PaddleOCR/blob/v2.6.0/doc/doc_ch/models_list.md. 包括后面所有的模型都是从里面下载下来的
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1' # 选择GPU运行,比如我这里使用1号GPU运行
5.1 检测模型
5.1.1 检测模型配置
PaddleOCR提供了许多检测模型,在路径PaddleOCR/configs/det
下可找到模型及其配置文件。如我们选用模型ch_PP-OCRv3_det_student.yml
,其配置文件路径在:PaddleOCR/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml
。使用前需对其进行必要的设置,如训练参数、数据集路径等。然后设置以下重要的参数部分,将部分关键配置展示如下:
#关键训练参数
use_gpu: true # 是否使用显卡GPU
epoch_num: 50 # 训练epoch个数
save_model_dir: ./output/ch_PP-OCR_V3_det/ #模型保存路径
save_epoch_step: 100 # 每训练100step,保存一次模型
eval_batch_step:
- 0
- 200 #训练每迭代400次,进行一次验证
pretrained_model: ./PaddleOCR/pretrained_model/ch_PP-OCRv3_det_distill_train/student.pdparams # 预训练模型路径
#训练集路径设置
Train:
dataset:
name: SimpleDataSet
data_dir: ./dataset/images #图片文件夹路径
label_file_list:
- ./det_train_label.txt #标签路径
# 同时也需要设置验证集
Eval:
dataset:
name: SimpleDataSet
data_dir: ./dataset/images
label_file_list:
- ./det_eval_label.txt
loader:
shuffle: true
drop_last: false
batch_size_per_card: 8 # 每张卡所占的batchsize,如果在训练过程中显存超限,可以把batch size调小一点
5.1.2 模型微调
在notebook中运行如下命令对模型进行微调,其中 -c 传入的为配置好的模型文件路径
!python PaddleOCR/tools/train.py -c PaddleOCR/configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml
[2022/11/22 13:06:21] ppocr INFO: Architecture :
[2022/11/22 13:06:21] ppocr INFO: Backbone :
[2022/11/22 13:06:21] ppocr INFO: disable_se : True
[2022/11/22 13:06:21] ppocr INFO: model_name : large
[2022/11/22 13:06:21] ppocr INFO: name : MobileNetV3
[2022/11/22 13:06:21] ppocr INFO: scale : 0.5
[2022/11/22 13:06:21] ppocr INFO: Head :
[2022/11/22 13:06:21] ppocr INFO: k : 50
[2022/11/22 13:06:21] ppocr INFO: name : DBHead
[2022/11/22 13:06:21] ppocr INFO: Neck :
[2022/11/22 13:06:21] ppocr INFO: name : RSEFPN
[2022/11/22 13:06:21] ppocr INFO: out_channels : 96
[2022/11/22 13:06:21] ppocr INFO: shortcut : True
[2022/11/22 13:06:21] ppocr INFO: Transform : None
[2022/11/22 13:06:21] ppocr INFO: algorithm : DB
[2022/11/22 13:06:21] ppocr INFO: model_type : det
[2022/11/22 13:06:21] ppocr INFO: Eval :
[2022/11/22 13:06:21] ppocr INFO: dataset :
[2022/11/22 13:06:21] ppocr INFO: data_dir : ./dataset/images
[2022/11/22 13:06:21] ppocr INFO: label_file_list : ['./det_eval_label.txt']
[2022/11/22 13:06:21] ppocr INFO: name : SimpleDataSet
[2022/11/22 13:06:21] ppocr INFO: transforms :
[2022/11/22 13:06:21] ppocr INFO: DecodeImage :
[2022/11/22 13:06:21] ppocr INFO: channel_first : False
[2022/11/22 13:06:21] ppocr INFO: img_mode : BGR
[2022/11/22 13:06:21] ppocr INFO: DetLabelEncode : None
[2022/11/22 13:06:21] ppocr INFO: DetResizeForTest : None
[2022/11/22 13:06:21] ppocr INFO: NormalizeImage :
[2022/11/22 13:06:21] ppocr INFO: mean : [0.485, 0.456, 0.406]
[2022/11/22 13:06:21] ppocr INFO: order : hwc
[2022/11/22 13:06:21] ppocr INFO: scale : 1./255.
[2022/11/22 13:06:21] ppocr INFO: std : [0.229, 0.224, 0.225]
[2022/11/22 13:06:21] ppocr INFO: ToCHWImage : None
[2022/11/22 13:06:21] ppocr INFO: KeepKeys :
[2022/11/22 13:06:21] ppocr INFO: keep_keys : ['image', 'shape', 'polys', 'ignore_tags']
[2022/11/22 13:06:21] ppocr INFO: loader :
[2022/11/22 13:06:21] ppocr INFO: batch_size_per_card : 1
[2022/11/22 13:06:21] ppocr INFO: drop_last : False
[2022/11/22 13:06:21] ppocr INFO: num_workers : 2
[2022/11/22 13:06:21] ppocr INFO: shuffle : False
[2022/11/22 13:06:21] ppocr INFO: Global :
[2022/11/22 13:06:21] ppocr INFO: cal_metric_during_train : False
[2022/11/22 13:06:21] ppocr INFO: checkpoints : None
[2022/11/22 13:06:21] ppocr INFO: debug : False
[2022/11/22 13:06:21] ppocr INFO: distributed : False
[2022/11/22 13:06:21] ppocr INFO: epoch_num : 50
[2022/11/22 13:06:21] ppocr INFO: eval_batch_step : [0, 200]
[2022/11/22 13:06:21] ppocr INFO: infer_img : doc/imgs_en/img_10.jpg
[2022/11/22 13:06:21] ppocr INFO: log_smooth_window : 20
[2022/11/22 13:06:21] ppocr INFO: pretrained_model : ./PaddleOCR/pretrained_model/ch_PP-OCRv3_det_distill_train/student.pdparams
[2022/11/22 13:06:21] ppocr INFO: print_batch_step : 10
[2022/11/22 13:06:21] ppocr INFO: save_epoch_step : 100
[2022/11/22 13:06:21] ppocr INFO: save_inference_dir : None
[2022/11/22 13:06:21] ppocr INFO: save_model_dir : ./output/ch_PP-OCR_V3_det/
[2022/11/22 13:06:21] ppocr INFO: save_res_path : ./checkpoints/det_db/predicts_db.txt
[2022/11/22 13:06:21] ppocr INFO: use_gpu : True
[2022/11/22 13:06:21] ppocr INFO: use_visualdl : False
[2022/11/22 13:06:21] ppocr INFO: Loss :
[2022/11/22 13:06:21] ppocr INFO: alpha : 5
[2022/11/22 13:06:21] ppocr INFO: balance_loss : True
[2022/11/22 13:06:21] ppocr INFO: beta : 10
[2022/11/22 13:06:21] ppocr INFO: main_loss_type : DiceLoss
[2022/11/22 13:06:21] ppocr INFO: name : DBLoss
[2022/11/22 13:06:21] ppocr INFO: ohem_ratio : 3
[2022/11/22 13:06:21] ppocr INFO: Metric :
[2022/11/22 13:06:21] ppocr INFO: main_indicator : hmean
[2022/11/22 13:06:21] ppocr INFO: name : DetMetric
[2022/11/22 13:06:21] ppocr INFO: Optimizer :
[2022/11/22 13:06:21] ppocr INFO: beta1 : 0.9
[2022/11/22 13:06:21] ppocr INFO: beta2 : 0.999
[2022/11/22 13:06:21] ppocr INFO: lr :
[2022/11/22 13:06:21] ppocr INFO: learning_rate : 0.001
[2022/11/22 13:06:21] ppocr INFO: name : Cosine
[2022/11/22 13:06:21] ppocr INFO: warmup_epoch : 2
[2022/11/22 13:06:21] ppocr INFO: name : Adam
[2022/11/22 13:06:21] ppocr INFO: regularizer :
[2022/11/22 13:06:21] ppocr INFO: factor : 5e-05
[2022/11/22 13:06:21] ppocr INFO: name : L2
[2022/11/22 13:06:21] ppocr INFO: PostProcess :
[2022/11/22 13:06:21] ppocr INFO: box_thresh : 0.6
[2022/11/22 13:06:21] ppocr INFO: max_candidates : 1000
[2022/11/22 13:06:21] ppocr INFO: name : DBPostProcess
[2022/11/22 13:06:21] ppocr INFO: thresh : 0.3
[2022/11/22 13:06:21] ppocr INFO: unclip_ratio : 1.5
[2022/11/22 13:06:21] ppocr INFO: Train :
[2022/11/22 13:06:21] ppocr INFO: dataset :
[2022/11/22 13:06:21] ppocr INFO: data_dir : ./dataset/images
[2022/11/22 13:06:21] ppocr INFO: label_file_list : ['./det_train_label.txt']
[2022/11/22 13:06:21] ppocr INFO: name : SimpleDataSet
[2022/11/22 13:06:21] ppocr INFO: ratio_list : [1.0]
[2022/11/22 13:06:21] ppocr INFO: transforms :
[2022/11/22 13:06:21] ppocr INFO: DecodeImage :
[2022/11/22 13:06:21] ppocr INFO: channel_first : False
[2022/11/22 13:06:21] ppocr INFO: img_mode : BGR
[2022/11/22 13:06:21] ppocr INFO: DetLabelEncode : None
[2022/11/22 13:06:21] ppocr INFO: IaaAugment :
[2022/11/22 13:06:21] ppocr INFO: augmenter_args :
[2022/11/22 13:06:21] ppocr INFO: args :
[2022/11/22 13:06:21] ppocr INFO: p : 0.5
[2022/11/22 13:06:21] ppocr INFO: type : Fliplr
[2022/11/22 13:06:21] ppocr INFO: args :
[2022/11/22 13:06:21] ppocr INFO: rotate : [-10, 10]
[2022/11/22 13:06:21] ppocr INFO: type : Affine
[2022/11/22 13:06:21] ppocr INFO: args :
[2022/11/22 13:06:21] ppocr INFO: size : [0.5, 3]
[2022/11/22 13:06:21] ppocr INFO: type : Resize
[2022/11/22 13:06:21] ppocr INFO: EastRandomCropData :
[2022/11/22 13:06:21] ppocr INFO: keep_ratio : True
[2022/11/22 13:06:21] ppocr INFO: max_tries : 50
[2022/11/22 13:06:21] ppocr INFO: size : [960, 960]
[2022/11/22 13:06:21] ppocr INFO: MakeBorderMap :
[2022/11/22 13:06:21] ppocr INFO: shrink_ratio : 0.4
[2022/11/22 13:06:21] ppocr INFO: thresh_max : 0.7
[2022/11/22 13:06:21] ppocr INFO: thresh_min : 0.3
[2022/11/22 13:06:21] ppocr INFO: MakeShrinkMap :
[2022/11/22 13:06:21] ppocr INFO: min_text_size : 8
[2022/11/22 13:06:21] ppocr INFO: shrink_ratio : 0.4
[2022/11/22 13:06:21] ppocr INFO: NormalizeImage :
[2022/11/22 13:06:21] ppocr INFO: mean : [0.485, 0.456, 0.406]
[2022/11/22 13:06:21] ppocr INFO: order : hwc
[2022/11/22 13:06:21] ppocr INFO: scale : 1./255.
[2022/11/22 13:06:21] ppocr INFO: std : [0.229, 0.224, 0.225]
[2022/11/22 13:06:21] ppocr INFO: ToCHWImage : None
[2022/11/22 13:06:21] ppocr INFO: KeepKeys :
[2022/11/22 13:06:21] ppocr INFO: keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask']
[2022/11/22 13:06:21] ppocr INFO: loader :
[2022/11/22 13:06:21] ppocr INFO: batch_size_per_card : 8
[2022/11/22 13:06:21] ppocr INFO: drop_last : False
[2022/11/22 13:06:21] ppocr INFO: num_workers : 4
[2022/11/22 13:06:21] ppocr INFO: shuffle : True
[2022/11/22 13:06:21] ppocr INFO: profiler_options : None
[2022/11/22 13:06:21] ppocr INFO: train with paddle 2.3.2 and device Place(gpu:0)
[2022/11/22 13:06:21] ppocr INFO: Initialize indexs of datasets:['./det_train_label.txt']
[2022/11/22 13:06:21] ppocr INFO: Initialize indexs of datasets:['./det_eval_label.txt']
W1122 13:06:21.615907 1637263 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.8, Runtime API Version: 11.6
W1122 13:06:21.621130 1637263 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
[2022/11/22 13:06:25] ppocr INFO: train dataloader has 250 iters
[2022/11/22 13:06:25] ppocr INFO: valid dataloader has 1003 iters
[2022/11/22 13:06:28] ppocr INFO: load pretrain successful from ./PaddleOCR/pretrained_model/ch_PP-OCRv3_det_distill_train/student
[2022/11/22 13:06:28] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 200 iterations
[2022/11/22 13:06:41] ppocr INFO: epoch: [1/50], global_step: 10, lr: 0.000009, loss: 3.958436, loss_shrink_maps: 2.519069, loss_threshold_maps: 0.946237, loss_binary_maps: 0.504128, avg_reader_cost: 0.24535 s, avg_batch_cost: 1.18631 s, avg_samples: 8.0, ips: 6.74361 samples/s, eta: 4:06:56
[2022/11/22 13:06:45] ppocr INFO: epoch: [1/50], global_step: 20, lr: 0.000019, loss: 3.866610, loss_shrink_maps: 2.481548, loss_threshold_maps: 0.944892, loss_binary_maps: 0.496426, avg_reader_cost: 0.00237 s, avg_batch_cost: 0.36309 s, avg_samples: 8.0, ips: 22.03292 samples/s, eta: 2:41:08
[2022/11/22 13:06:49] ppocr INFO: epoch: [1/50], global_step: 30, lr: 0.000039, loss: 3.867103, loss_shrink_maps: 2.555314, loss_threshold_maps: 0.927763, loss_binary_maps: 0.511137, avg_reader_cost: 0.00391 s, avg_batch_cost: 0.36944 s, avg_samples: 8.0, ips: 21.65439 samples/s, eta: 2:12:55
[2022/11/22 13:06:53] ppocr INFO: epoch: [1/50], global_step: 40, lr: 0.000059, loss: 3.878752, loss_shrink_maps: 2.555314, loss_threshold_maps: 0.883135, loss_binary_maps: 0.511137, avg_reader_cost: 0.00372 s, avg_batch_cost: 0.36812 s, avg_samples: 8.0, ips: 21.73230 samples/s, eta: 1:58:43
[2022/11/22 13:06:58] ppocr INFO: epoch: [1/50], global_step: 50, lr: 0.000079, loss: 3.711426, loss_shrink_maps: 2.362094, loss_threshold_maps: 0.808442, loss_binary_maps: 0.472508, avg_reader_cost: 0.00139 s, avg_batch_cost: 0.36452 s, avg_samples: 8.0, ips: 21.94658 samples/s, eta: 1:50:02
[2022/11/22 13:07:02] ppocr INFO: epoch: [1/50], global_step: 60, lr: 0.000099, loss: 3.374191, loss_shrink_maps: 2.211356, loss_threshold_maps: 0.763958, loss_binary_maps: 0.442548, avg_reader_cost: 0.00234 s, avg_batch_cost: 0.36611 s, avg_samples: 8.0, ips: 21.85163 samples/s, eta: 1:44:16
[2022/11/22 13:07:06] ppocr INFO: epoch: [1/50], global_step: 70, lr: 0.000119, loss: 3.223969, loss_shrink_maps: 2.070736, loss_threshold_maps: 0.746714, loss_binary_maps: 0.414206, avg_reader_cost: 0.00026 s, avg_batch_cost: 0.36592 s, avg_samples: 8.0, ips: 21.86294 samples/s, eta: 1:40:08
[2022/11/22 13:07:10] ppocr INFO: epoch: [1/50], global_step: 80, lr: 0.000139, loss: 2.865101, loss_shrink_maps: 1.764307, loss_threshold_maps: 0.722126, loss_binary_maps: 0.353773, avg_reader_cost: 0.00557 s, avg_batch_cost: 0.37052 s, avg_samples: 8.0, ips: 21.59116 samples/s, eta: 1:37:08
[2022/11/22 13:07:15] ppocr INFO: epoch: [1/50], global_step: 90, lr: 0.000159, loss: 2.916367, loss_shrink_maps: 1.785145, loss_threshold_maps: 0.736615, loss_binary_maps: 0.357866, avg_reader_cost: 0.00193 s, avg_batch_cost: 0.36543 s, avg_samples: 8.0, ips: 21.89180 samples/s, eta: 1:34:40
[2022/11/22 13:07:19] ppocr INFO: epoch: [1/50], global_step: 100, lr: 0.000179, loss: 2.895378, loss_shrink_maps: 1.764575, loss_threshold_maps: 0.750305, loss_binary_maps: 0.352447, avg_reader_cost: 0.00201 s, avg_batch_cost: 0.36573 s, avg_samples: 8.0, ips: 21.87380 samples/s, eta: 1:32:41
[2022/11/22 13:07:23] ppocr INFO: epoch: [1/50], global_step: 110, lr: 0.000199, loss: 2.629834, loss_shrink_maps: 1.587836, loss_threshold_maps: 0.753817, loss_binary_maps: 0.316871, avg_reader_cost: 0.00125 s, avg_batch_cost: 0.36450 s, avg_samples: 8.0, ips: 21.94768 samples/s, eta: 1:31:02
[2022/11/22 13:07:27] ppocr INFO: epoch: [1/50], global_step: 120, lr: 0.000219, loss: 2.397547, loss_shrink_maps: 1.460247, loss_threshold_maps: 0.667723, loss_binary_maps: 0.291848, avg_reader_cost: 0.00466 s, avg_batch_cost: 0.37048 s, avg_samples: 8.0, ips: 21.59387 samples/s, eta: 1:29:45
[2022/11/22 13:07:32] ppocr INFO: epoch: [1/50], global_step: 130, lr: 0.000239, loss: 2.378679, loss_shrink_maps: 1.401813, loss_threshold_maps: 0.673884, loss_binary_maps: 0.280701, avg_reader_cost: 0.00088 s, avg_batch_cost: 0.36518 s, avg_samples: 8.0, ips: 21.90695 samples/s, eta: 1:28:34
[2022/11/22 13:07:36] ppocr INFO: epoch: [1/50], global_step: 140, lr: 0.000259, loss: 2.451726, loss_shrink_maps: 1.482388, loss_threshold_maps: 0.681260, loss_binary_maps: 0.296235, avg_reader_cost: 0.00300 s, avg_batch_cost: 0.38128 s, avg_samples: 8.0, ips: 20.98186 samples/s, eta: 1:27:47
[2022/11/22 13:07:41] ppocr INFO: epoch: [1/50], global_step: 150, lr: 0.000279, loss: 2.589176, loss_shrink_maps: 1.562890, loss_threshold_maps: 0.715202, loss_binary_maps: 0.312893, avg_reader_cost: 0.00400 s, avg_batch_cost: 0.37979 s, avg_samples: 8.0, ips: 21.06412 samples/s, eta: 1:27:05
[2022/11/22 13:07:45] ppocr INFO: epoch: [1/50], global_step: 160, lr: 0.000299, loss: 2.706166, loss_shrink_maps: 1.639270, loss_threshold_maps: 0.734711, loss_binary_maps: 0.328181, avg_reader_cost: 0.00109 s, avg_batch_cost: 0.37755 s, avg_samples: 8.0, ips: 21.18906 samples/s, eta: 1:26:25
[2022/11/22 13:07:49] ppocr INFO: epoch: [1/50], global_step: 170, lr: 0.000319, loss: 2.643976, loss_shrink_maps: 1.618946, loss_threshold_maps: 0.707388, loss_binary_maps: 0.324081, avg_reader_cost: 0.00578 s, avg_batch_cost: 0.38069 s, avg_samples: 8.0, ips: 21.01471 samples/s, eta: 1:25:52
[2022/11/22 13:07:54] ppocr INFO: epoch: [1/50], global_step: 180, lr: 0.000339, loss: 2.542865, loss_shrink_maps: 1.494720, loss_threshold_maps: 0.716266, loss_binary_maps: 0.299067, avg_reader_cost: 0.00024 s, avg_batch_cost: 0.37948 s, avg_samples: 8.0, ips: 21.08149 samples/s, eta: 1:25:22
[2022/11/22 13:07:58] ppocr INFO: epoch: [1/50], global_step: 190, lr: 0.000359, loss: 2.484875, loss_shrink_maps: 1.468433, loss_threshold_maps: 0.721491, loss_binary_maps: 0.293729, avg_reader_cost: 0.00022 s, avg_batch_cost: 0.36306 s, avg_samples: 8.0, ips: 22.03508 samples/s, eta: 1:24:44
[2022/11/22 13:08:02] ppocr INFO: epoch: [1/50], global_step: 200, lr: 0.000379, loss: 2.391915, loss_shrink_maps: 1.404688, loss_threshold_maps: 0.692680, loss_binary_maps: 0.281057, avg_reader_cost: 0.00021 s, avg_batch_cost: 0.37660 s, avg_samples: 8.0, ips: 21.24274 samples/s, eta: 1:24:17
eval model:: 100%|██████████████████████████| 1003/1003 [00:55<00:00, 18.21it/s]
[2022/11/22 13:08:57] ppocr INFO: cur metric, precision: 0.766156462585034, recall: 0.9745808545159546, hmean: 0.8578909783384908, fps: 24.47464613318005
[2022/11/22 13:08:57] ppocr INFO: save best model is to ./output/ch_PP-OCR_V3_det/best_accuracy
[2022/11/22 13:08:57] ppocr INFO: best metric, hmean: 0.8578909783384908, is_float16: False, precision: 0.766156462585034, recall: 0.9745808545159546, fps: 24.47464613318005, best_epoch: 1
......
eval model:: 100%|██████████████████████████| 1003/1003 [01:00<00:00, 16.46it/s]
[2022/11/22 15:41:39] ppocr INFO: cur metric, precision: 0.9622942113648434, recall: 0.9799891833423472, hmean: 0.9710610932475884, fps: 22.98522684952537
[2022/11/22 15:41:39] ppocr INFO: best metric, hmean: 0.9742212674543502, is_float16: False, precision: 0.9674666666666667, recall: 0.9810708491076258, fps: 23.192479978844272, best_epoch: 31
[2022/11/22 15:41:43] ppocr INFO: epoch: [50/50], global_step: 12410, lr: 0.000006, loss: 1.340954, loss_shrink_maps: 0.700807, loss_threshold_maps: 0.478234, loss_binary_maps: 0.139876, avg_reader_cost: 0.00200 s, avg_batch_cost: 0.37641 s, avg_samples: 8.0, ips: 21.25350 samples/s, eta: 0:00:34
[2022/11/22 15:41:48] ppocr INFO: epoch: [50/50], global_step: 12420, lr: 0.000005, loss: 1.443306, loss_shrink_maps: 0.745190, loss_threshold_maps: 0.499261, loss_binary_maps: 0.149030, avg_reader_cost: 0.00094 s, avg_batch_cost: 0.39211 s, avg_samples: 8.0, ips: 20.40230 samples/s, eta: 0:00:30
[2022/11/22 15:41:52] ppocr INFO: epoch: [50/50], global_step: 12430, lr: 0.000005, loss: 1.321360, loss_shrink_maps: 0.683633, loss_threshold_maps: 0.489775, loss_binary_maps: 0.136621, avg_reader_cost: 0.00189 s, avg_batch_cost: 0.38061 s, avg_samples: 8.0, ips: 21.01903 samples/s, eta: 0:00:27
[2022/11/22 15:41:56] ppocr INFO: epoch: [50/50], global_step: 12440, lr: 0.000005, loss: 1.238384, loss_shrink_maps: 0.635735, loss_threshold_maps: 0.448261, loss_binary_maps: 0.126847, avg_reader_cost: 0.00201 s, avg_batch_cost: 0.37915 s, avg_samples: 8.0, ips: 21.09974 samples/s, eta: 0:00:23
[2022/11/22 15:42:01] ppocr INFO: epoch: [50/50], global_step: 12450, lr: 0.000005, loss: 1.191645, loss_shrink_maps: 0.622861, loss_threshold_maps: 0.437820, loss_binary_maps: 0.124672, avg_reader_cost: 0.00017 s, avg_batch_cost: 0.37562 s, avg_samples: 8.0, ips: 21.29834 samples/s, eta: 0:00:19
[2022/11/22 15:42:05] ppocr INFO: epoch: [50/50], global_step: 12460, lr: 0.000005, loss: 1.180676, loss_shrink_maps: 0.629529, loss_threshold_maps: 0.427876, loss_binary_maps: 0.126142, avg_reader_cost: 0.00199 s, avg_batch_cost: 0.37954 s, avg_samples: 8.0, ips: 21.07820 samples/s, eta: 0:00:15
[2022/11/22 15:42:10] ppocr INFO: epoch: [50/50], global_step: 12470, lr: 0.000005, loss: 1.238333, loss_shrink_maps: 0.670731, loss_threshold_maps: 0.432300, loss_binary_maps: 0.134142, avg_reader_cost: 0.00199 s, avg_batch_cost: 0.37980 s, avg_samples: 8.0, ips: 21.06347 samples/s, eta: 0:00:11
[2022/11/22 15:42:14] ppocr INFO: epoch: [50/50], global_step: 12480, lr: 0.000004, loss: 1.254117, loss_shrink_maps: 0.662911, loss_threshold_maps: 0.454790, loss_binary_maps: 0.132727, avg_reader_cost: 0.00181 s, avg_batch_cost: 0.37586 s, avg_samples: 8.0, ips: 21.28430 samples/s, eta: 0:00:07
[2022/11/22 15:42:18] ppocr INFO: epoch: [50/50], global_step: 12490, lr: 0.000004, loss: 1.324386, loss_shrink_maps: 0.701260, loss_threshold_maps: 0.501584, loss_binary_maps: 0.140036, avg_reader_cost: 0.00019 s, avg_batch_cost: 0.37545 s, avg_samples: 8.0, ips: 21.30779 samples/s, eta: 0:00:03
[2022/11/22 15:42:23] ppocr INFO: epoch: [50/50], global_step: 12500, lr: 0.000004, loss: 1.407378, loss_shrink_maps: 0.767860, loss_threshold_maps: 0.488598, loss_binary_maps: 0.153395, avg_reader_cost: 0.00346 s, avg_batch_cost: 0.37688 s, avg_samples: 8.0, ips: 21.22694 samples/s, eta: 0:00:00
[2022/11/22 15:42:23] ppocr INFO: save model in ./output/ch_PP-OCR_V3_det/latest
[2022/11/22 15:42:23] ppocr INFO: best metric, hmean: 0.9742212674543502, is_float16: False, precision: 0.9674666666666667, recall: 0.9810708491076258, fps: 23.192479978844272, best_epoch: 31
修改了默认超参数,进行训练,模型ch_PP-OCRv3_det_student
在训练集上训练50个epoch后,模型在验证集上的hmean达到:97.4%,在后面的epochs无明显增长
如果在训练过程中显存超限,可以把batch size调小一点
5.2 识别模型
5.2.1 识别模型配置
PaddleOCR也提供了许多识别模型,在路径PaddleOCR-r/configs/rec
下可找到模型及其配置文件。如我们选用模型ch_PP-OCRv3_rec_distillation,其配置文件路径在:PaddleOCR/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
。使用前需对其进行必要的设置,如训练参数、数据集路径等。同样也要下载下来预训练权重模型,将部分关键配置展示如下:
#关键训练参数
use_gpu: true #是否使用显卡
epoch_num: 100 #训练epoch个数
save_model_dir: ./output/rec_ppocr_v3_distillation #模型保存路径
save_epoch_step: 100 #每训练100step,保存一次模型
eval_batch_step: [0, 100] #训练每迭代100次,进行一次验证
pretrained_model: ./PaddleOCR/pretrain_modeled/ch_PP-OCRv3_rec_train/best_accuracy.pdparams #预训练模型路径
#训练集路径设置
Train:
dataset:
name: SimpleDataSet
data_dir: ./RecTrainData/ #图片文件夹路径
label_file_list:
- ./rec_train_label.txt #标签路径
5.2.2 模型微调
在notebook中运行如下命令对模型进行微调,其中 -c 传入的为配置好的模型文件路径
!python PaddleOCR/tools/train.py \\
-c PaddleOCR/configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
[2022/11/22 15:42:30] ppocr INFO: Architecture :
[2022/11/22 15:42:30] ppocr INFO: Models :
[2022/11/22 15:42:30] ppocr INFO: Student :
[2022/11/22 15:42:30] ppocr INFO: Backbone :
[2022/11/22 15:42:30] ppocr INFO: last_conv_stride : [1, 2]
[2022/11/22 15:42:30] ppocr INFO: last_pool_type : avg
[2022/11/22 15:42:30] ppocr INFO: name : MobileNetV1Enhance
[2022/11/22 15:42:30] ppocr INFO: scale : 0.5
[2022/11/22 15:42:30] ppocr INFO: Head :
[2022/11/22 15:42:30] ppocr INFO: head_list :
[2022/11/22 15:42:30] ppocr INFO: CTCHead :
[2022/11/22 15:42:30] ppocr INFO: Head :
[2022/11/22 15:42:30] ppocr INFO: fc_decay : 1e-05
[2022/11/22 15:42:30] ppocr INFO: Neck :
[2022/11/22 15:42:30] ppocr INFO: depth : 2
[2022/11/22 15:42:30] ppocr INFO: dims : 64
[2022/11/22 15:42:30] ppocr INFO: hidden_dims : 120
[2022/11/22 15:42:30] ppocr INFO: name : svtr
[2022/11/22 15:42:30] ppocr INFO: use_guide : True
[2022/11/22 15:42:30] ppocr INFO: SARHead :
[2022/11/22 15:42:30] ppocr INFO: enc_dim : 512
[2022/11/22 15:42:30] ppocr INFO: max_text_length : 25
[2022/11/22 15:42:30] ppocr INFO: name : MultiHead
[2022/11/22 15:42:30] ppocr INFO: Transform : None
[2022/11/22 15:42:30] ppocr INFO: algorithm : SVTR
[2022/11/22 15:42:30] ppocr INFO: freeze_params : False
[2022/11/22 15:42:30] ppocr INFO: model_type : rec
[2022/11/22 15:42:30] ppocr INFO: pretrained : None
[2022/11/22 15:42:30] ppocr INFO: return_all_feats : True
[2022/11/22 15:42:30] ppocr INFO: Teacher :
[2022/11/22 15:42:30] ppocr INFO: Backbone :
[2022/11/22 15:42:30] ppocr INFO: last_conv_stride : [1, 2]
[2022/11/22 15:42:30] ppocr INFO: last_pool_type : avg
[2022/11/22 15:42:30] ppocr INFO: name : MobileNetV1Enhance
[2022/11/22 15:42:30] ppocr INFO: scale : 0.5
[2022/11/22 15:42:30] ppocr INFO: Head :
[2022/11/22 15:42:30] ppocr INFO: head_list :
[2022/11/22 15:42:30] ppocr INFO: CTCHead :
[2022/11/22 15:42:30] ppocr INFO: Head :
[2022/11/22 15:42:30] ppocr INFO: fc_decay : 1e-05
[2022/11/22 15:42:30] ppocr INFO: Neck :
[2022/11/22 15:42:30] ppocr INFO: depth : 2
[2022/11/22 15:42:30] ppocr INFO: dims : 64
[2022/11/22 15:42:30] ppocr INFO: hidden_dims : 120
[2022/11/22 15:42:30] ppocr INFO: name : svtr
[2022/11/22 15:42:30] ppocr INFO: use_guide : True
[2022/11/22 15:42:30] ppocr INFO: SARHead :
[2022/11/22 15:42:30] ppocr INFO: enc_dim : 512
[2022/11/22 15:42:30] ppocr INFO: max_text_length : 25
[2022/11/22 15:42:30] ppocr INFO: name : MultiHead
[2022/11/22 15:42:30] ppocr INFO: Transform : None
[2022/11/22 15:42:30] ppocr INFO: algorithm : SVTR
[2022/11/22 15:42:30] ppocr INFO: freeze_params : False
[2022/11/22 15:42:30] ppocr INFO: model_type : rec
[2022/11/22以上是关于基于PaddleOCR的集装箱箱号检测识别的主要内容,如果未能解决你的问题,请参考以下文章