基于PaddleOCR的体检报告识别

Posted 2022-08-29 GoAI

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了基于PaddleOCR的体检报告识别相关的知识，希望对你有一定的参考价值。

✨写在前面：强烈推荐给大家一个优秀的人工智能学习网站，内容包括人工智能基础、机器学习、深度学习神经网络等，详细介绍各部分概念及实战教程，通俗易懂，非常适合人工智能领域初学者及研究者学习。➡️点击跳转到网站。

基于PaddleOCR的体检报告识别

一、项目背景与意义

面对飞速发展互联网医疗时代，医疗信息化建设已经成为医疗行业发展的趋势。经调研，约80%的医学病历是处于非结构化状态的，难以直接被利用而造成了大量医学资源浪费。医疗数据中大量的半结构化与无结构化的文本，医学术语的专业性以及语言表达的多样性为结构化信息抽取带来了很大难度。因此，针对电子病历和报告的信息识别抽取和结构化管理对临床诊断、疾病预防与医学研究具有重要意义。

体检报告识别可以帮助医务服务人员自动识别录入用户征信信息，节约人力成本、提升服务效率，实现降本增效，具有重要实际意义。基于PaddleOCR已在文字识别领域取得优秀成果，本项目基于PaddleOCR实现体检报告检测与识别，对数据进行结构化处理，结合CV+NLP技术达到一定识别精度，未来推广应用场景可以基于识别信息做个性化疾病预测与健康推荐。

二、项目链接

PaddleOCR体检报告识别 - 飞桨AI Studio

三、项目流程

PaddleOCR是百度开源的超轻量级OCR模型库，本文使用其框架进行体检报告识别，本次项目具体流程包括：

PaddleOCR环境安装与快速预测
体检报告检测模型训练det
体检报告识别模型训练rec

四、技术介绍

针对PaddleOCR提供的算法模型，本次选择基础模型用于体检报告识别与检测，流程如下：

1.检测：DB算法

文字检测参考这篇：

OCR文字识别技术总结（三）__文本检测算法总结

2.识别：CRNN+CTC

CRNN可参考这篇文章：

CRNN文字识别_GoAI的博客-CSDN博客_crnn

五、数据集介绍

数据位置: data/data159696/report_ex.tar

解压命令 !tar -xf /home/aistudio/data/data159696/report_ex.tar

数据集结构：

/home/aistudio/report_ex
  └─ pngs:存放体检照片，以pngs形式结尾
  └─ txts: 存放标注坐标信息及包含内容.
  └─ json：内容同上 ，存放json格式信息。


数据集txt格式为:

Rect (182.0, 1078.03125, 266.0, 1064.03125) 姓名：张某某

Rect (356.0, 1078.03125, 412.0, 1064.03125) 性别：男

Rect (516.0, 1078.03125, 572.0, 1064.03125) 年龄：40

注:本数据坐标是以左下角为原点，利用Paddleocr做检测时需要转换成左上角原点，且本数据坐标需要横纵坐标都乘4.

图片样式：

1.安装环境与测试

1.1 安装项目环境

安装PaddleOCR相关环境

%cd ~ 
!git clone -b release/2.1 https://github.com/PaddlePaddle/PaddleOCR.git

# 安装依赖库
%cd ~/PaddleOCR
!pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple

1.2. 下载预测模型并测试

下载PaddleOCR中文轻量级OCR模型用于测试部分图像识别结果，模型存放在在PaddleOCR/inference目录下。

In [ ]

! mkdir inference
# 下载超轻量级中文OCR模型的检测模型并解压
! cd inference && wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar && tar xf ch_ppocr_mobile_v2.0_det_infer.tar && rm ch_ppocr_mobile_v2.0_det_infer.tar
# 下载超轻量级中文OCR模型的识别模型并解压
! cd inference && wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar && tar xf ch_ppocr_mobile_v2.0_rec_infer.tar && rm ch_ppocr_mobile_v2.0_rec_infer.tar
# 下载超轻量级中文OCR模型的文本方向分类器模型并解压
! cd inference && wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar && tar xf ch_ppocr_mobile_v2.0_cls_infer.tar && rm ch_ppocr_mobile_v2.0_cls_infer.tar

1.3测试单张报告数据集并可视化

In [ ]

import matplotlib.pyplot as plt
from PIL import Image
%pylab inline

def show_img(img_path,figsize=(10,10)):
    ## 显示原图，读取名称为11.jpg的测试图像
    img = Image.open(img_path)
    plt.figure("test_img", figsize=figsize)
    plt.imshow(img)
    plt.show()
show_img("../20220623110401-0.png")

Populating the interactive namespace from numpy and matplotlib

测试单张图像

调用tools/infer/predict_system.py 完成报告识别，共需要传入三个参数：

image_dir：指定要测试的图像
det_model_dir：指定轻量检测模型的inference model
rec_model_dir：指定轻量识别模型的inference model
cls_model_dir：指定轻量方向分类器模型的inference model

In [ ]

# 快速运行
!python3 ./tools/infer/predict_system.py --image_dir="../20220623110401-0.png" \\
--det_model_dir="./inference/ch_ppocr_mobile_v2.0_det_infer"  \\
--rec_model_dir="./inference/ch_ppocr_mobile_v2.0_rec_infer" \\
--cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer"

In [27]

# 训练效果
!python3 ./tools/infer/predict_system.py --image_dir="../20220623110401-0.png" \\
--det_model_dir="./outputall/db_mv3/best_accuracy"  \\
--rec_model_dir="./output/rec/best_accuracy" \\
--cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer"

上述结果说明：输出结果中有两列数据，第一列表示PaddleOCR识别出的文字，第二列表示识别出当前文字的置信度。置信度的数据范围是[0-1]，置信度越接近1表示文本识别对的“信心”越大。同时，识别结果会可视化在图像中并保存在./inference_results文件夹下，可以通过左边的目录结构选择要打开的文件，也可以通过如下代码将可视化后的图像显示出来，观察OCR文本识别的效果。

针对上述./inference_results/20220623110401-0.png检测结果展示如下：

In [ ]

show_img("./inference_results/20220623110401-0.png",figsize=(20,20))

2. 训练文字检测模型

PaddOCR官方检测模型数据集以icdar15为例，本文参照其标注格式进行检测模型的训练、评估与测试，模型以MobienetV3网络为例，可自己更改其他网络。

注：官方icdar15数据集存放在 ~/data/data34815/icdar2015.tar ，后续如有数据格式问题可做参考。官方数据~/train_data/icdar2015/text_localization 有两个文件夹和两个文件，分别是：

~/train_data/icdar2015/text_localization 
  └─ icdar_c4_train_imgs/         icdar数据集的训练数据
  └─ ch4_test_images/             icdar数据集的测试数据
  └─ train_icdar2015_label.txt    icdar数据集的训练标注
  └─ test_icdar2015_label.txt     icdar数据集的测试标注

官方提供的标注文件格式为：

" 图像文件名                    json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg    ["transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...]

json.dumps编码前的图像标注信息是包含多个字典的list，字典中的pointspointspoints表示文本框的四个点的坐标(x, y)，从左上角的点开始顺时针排列。 transcriptiontranscriptiontranscription表示当前文本框的文字，在文本检测任务中并不需要这个信息。

2.1. 数据准备

首先解压本次体检报告数据到当前目录，解压命令如下：

!tar -xf /home/aistudio/data/data159696/report_ex.tar

#查看当前文件夹图片数量
%cd /home/aistudio/report_ex/pngs
!ls -l | grep "^-" | wc -l   #一共20011张图片

/home/aistudio/report_ex/pngs
20011

本次体检报告数据集txt格式为:

Rect (182.0, 1078.03125, 266.0, 1064.03125) 姓名：张某某

Rect (356.0, 1078.03125, 412.0, 1064.03125) 性别：男

Rect (516.0, 1078.03125, 572.0, 1064.03125) 年龄：40

由于数据格式不同，本项目需要编写转换数据程序构建为PaddleOCR标注文件格式, 由于时间原因，格式代码比较粗糙，读者后续可根据需求自行完善。

以1.部分数据集为例的训练相关代码：

/home/aistudio/report_ex/
  └─ train_det_new1_hebing/        report_ex数据集的测试数据
  └─ test_det_new1_hebing  			  report_ex数据集的测试数据
/home/aistudio/  
  └─ train_det_new1_hebing.txt.txt    report_ex数据集的训练标注
  └─ test_det_new1_hebing.txt.txt    report_ex数据集的测试标注
  └─ gen_data_det_reg.py          格式转换代码
  └─ hebing.py						数据合并
  └─ split_data.py					切分训练集与测试集
  └─ file.py               拷贝训练集与测试集图片到文件夹
/home/aistudio/PaddleOCR
  └─ tools/train.py            训练代码
  └─ tools/infer_det.py         推理代码
  └─ configs/det/det_mv3_db_all.yml  配置文件

2.2 快速启动训练

下载PaddleOCR主流两种检测模型backbone，MobileNetV3和ResNet50_vd，后续可以根据需求使用PaddleClas中的模型更换backbone。

In [ ]

# 下载MobileNetV3的预训练模型
!pwd
!wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
! cd pretrain_models/ && tar xf MobileNetV3_large_x0_5_pretrained.tar
# 下载ResNet50的预训练模型
!wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
! cd pretrain_models/ && tar xf ResNet50_vd_ssld_pretrained.tar

2.3数据转换

运行转换程序gen_data_det_reg.py，生成det检测和reg识别的txt格式，以实际运行结果为准！

det.txt

20220623110401-0.png ["transcription":"姓名：张某某","points":[[182.0,4256.125],[266.0,4256.125],[182.0,4312.125],[266.0,4312.125]]]

20220623110401-0.png ["transcription":"性别：男","points":[[356.0,4256.125],[412.0,4256.125],[356.0,4312.125],[412.0,4312.125]]]

20220623110401-0.png ["transcription":"年龄：40","points":[[516.0,4256.125],[572.0,4256.125],[516.0,4312.125],[572.0,4312.125]]]

reg.txt

20220623110401-0.png 姓名：张某某

20220623110401-0.png 性别：男

20220623110401-0.png 年龄：40

本次体检报告由于数据量巨大且训练时间长，为了方便大家查看运行效果及调试，下列训练用到的数据集会分为 1.部分数据集与 2.全集数据集，脚本命名已写在注释中，按需运行按需打开注释即可。

#1.部分数据集数据转换脚本, 生成det1.txt ,合并后一共一百多张图片， 坐标为横坐标×4 、 纵坐标=图片高度-纵坐标×4 ， reg.txt目前没用到。
#执行报错，会出现IndexError: list index out of range,  只生成2万多条数据，但不影响跑。
%cd /home/aistudio/
# !python ./gen_data_det_reg.py
#2.跑全量数据脚本,生成det_all.txt，可以跑通，但全量数据集太大，还未执行，
# %cd /home/aistudio/
!python ./gen_data_all.py

#合并上述生成txt的数据，改为一张照片对应所有坐标合并成一行  ,生成合并后新的txt
#1.部分数据合并 det1.txt，生成det_new_hebing.txt
# !python hebing.py
#2.全量数据合并 det_all.txt，生成det_new_hebing_all.txt
!python hebing_all.py

2.4数据集划分

将检测数据det.txt、识别数据reg.txt 划分训练集和验证集 , 生成train_det.txt 、 test_det.txt、 train_reg.txt 、 test_reg.txt四个文件

In [ ]

#1.划分部分数据集用于训练，将det.txt拆成train_det_new1_hebing.txt和test_det_new1_hebing.txt，注意生成的训练测试集txt名字要跟训练时相同
# !python split_data.py
#2.划分全量数据集,将det_new_hebing_all.txt拆成 train_det_hebing_all.txt和test_det_hebing_all.txt
!python split_data_all.py

2.5拷贝数据集图片

#注：拷贝上述训练及验证集图片到对应路径

#编辑file.py打开对应注释，重复执行两次！！！ 一次train 、一次test , 生成上述txt对应的图片文件夹包含图片。

#1.部分数据图片拷贝到文件夹,拷贝train_det_new1.txt和test_det_new1.txt包含的图片拷贝到新文件夹./report_ex/train_det_new1和report_ex/test_det_new1，用于测试
# !python file.py
#2.全量数据图片拷贝到文件夹，train_det_hebing_all.txt和test_det_hebing_all.txt包含的图片拷贝到新文件夹./report_ex/train_det_hebing_all和./report_ex/test_det_hebing_all
!python file_all.py

2.6.检测模型训练

本次选择backbone为MobileNetV3、Resnet50的db算法的检测模型.通过-c 选择训练使用配置文件configs/det/det_db_mv3.yml配置文件，-o参数在不需要修改yml文件的情况下，改变训练的参数

In [ ]

# 官方训练backbone为MobileNetV3的db算法的检测模型，此部分只做参考，不用执行！
# !python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db.yml -o \\
# Global.eval_batch_step="[0,500]" \\
# Global.load_static_weights=true \\
# Global.pretrained_model='PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained' \\
# Train.dataset.data_dir='PaddleOCR/train_data/text_localization/' \\
# Train.dataset.label_file_list=['PaddleOCR/train_data/text_localization/train_icdar2015_label.txt'] \\
# Eval.dataset.data_dir='PaddleOCR/train_data/text_localization/' \\
# Eval.dataset.label_file_list=['PaddleOCR/train_data/text_localization/test_icdar2015_label.txt']

#每次启动运行结果提示缺少包执行此条安装相关环境
!pip install lmdb
!pip install pyclipper
!pip install  Levenshtein
!pip install imgaug

全量数据训练

由于数据量较大且训练时间较长，针对上述两种不同数据集大小，本次训练分别列出全量及部分数据集训练代码，可按需选择。

In [ ]

#1.合并后全量数据集+MobileNetV3检测模型训练
%cd /home/aistudio/
!python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db_all.yml -o \\
Global.eval_batch_step="[0,300]" \\
Global.load_static_weights=true \\
Global.checkpoints='./outputall/db_mv3/best_accuracy' \\
Global.pretrained_model='PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained' \\
Train.loader.batch_size_per_card=32 \\
Train.dataset.data_dir='./report_ex/train_det_hebing_all' \\
Train.dataset.label_file_list=['./train_det_hebing_all.txt'] \\
Eval.dataset.data_dir='./report_ex/test_det_hebing_all' \\
Eval.dataset.label_file_list=['./test_det_hebing_all.txt']

/home/aistudio
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
[2022/08/10 15:52:35] root INFO: Architecture : 
[2022/08/10 15:52:35] root INFO:     Backbone : 
[2022/08/10 15:52:35] root INFO:         model_name : large
[2022/08/10 15:52:35] root INFO:         name : MobileNetV3
[2022/08/10 15:52:35] root INFO:         scale : 0.5
[2022/08/10 15:52:35] root INFO:     Head : 
[2022/08/10 15:52:35] root INFO:         k : 50
[2022/08/10 15:52:35] root INFO:         name : DBHead
[2022/08/10 15:52:35] root INFO:     Neck : 
[2022/08/10 15:52:35] root INFO:         name : DBFPN
[2022/08/10 15:52:35] root INFO:         out_channels : 256
[2022/08/10 15:52:35] root INFO:     Transform : None
[2022/08/10 15:52:35] root INFO:     algorithm : DB
[2022/08/10 15:52:35] root INFO:     model_type : det
[2022/08/10 15:52:35] root INFO: Eval : 
[2022/08/10 15:52:35] root INFO:     dataset : 
[2022/08/10 15:52:35] root INFO:         data_dir : ./report_ex/test_det_hebing_all
[2022/08/10 15:52:35] root INFO:         label_file_list : ['./test_det_hebing_all.txt']
[2022/08/10 15:52:35] root INFO:         name : SimpleDataSet
[2022/08/10 15:52:35] root INFO:         transforms : 
[2022/08/10 15:52:35] root INFO:             DecodeImage : 
[2022/08/10 15:52:35] root INFO:                 channel_first : False
[2022/08/10 15:52:35] root INFO:                 img_mode : BGR
[2022/08/10 15:52:35] root INFO:             DetLabelEncode : None
[2022/08/10 15:52:35] root INFO:             DetResizeForTest : 
[2022/08/10 15:52:35] root INFO:                 image_shape : [736, 1280]
[2022/08/10 15:52:35] root INFO:             NormalizeImage : 
[2022/08/10 15:52:35] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/08/10 15:52:35] root INFO:                 order : hwc
[2022/08/10 15:52:35] root INFO:                 scale : 1./255.
[2022/08/10 15:52:35] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/08/10 15:52:35] root INFO:             ToCHWImage : None
[2022/08/10 15:52:35] root INFO:             KeepKeys : 
[2022/08/10 15:52:35] root INFO:                 keep_keys : ['image', 'shape', 'polys', 'ignore_tags']
[2022/08/10 15:52:35] root INFO:     loader : 
[2022/08/10 15:52:35] root INFO:         batch_size_per_card : 1
[2022/08/10 15:52:35] root INFO:         drop_last : False
[2022/08/10 15:52:35] root INFO:         num_workers : 8
[2022/08/10 15:52:35] root INFO:         shuffle : False
[2022/08/10 15:52:35] root INFO:         use_shared_memory : False
[2022/08/10 15:52:35] root INFO: Global : 
[2022/08/10 15:52:35] root INFO:     cal_metric_during_train : False
[2022/08/10 15:52:35] root INFO:     checkpoints : ./outputall/db_mv3/best_accuracy
[2022/08/10 15:52:35] root INFO:     debug : False
[2022/08/10 15:52:35] root INFO:     distributed : False
[2022/08/10 15:52:35] root INFO:     epoch_num : 1200
[2022/08/10 15:52:35] root INFO:     eval_batch_step : [0, 500]
[2022/08/10 15:52:35] root INFO:     infer_img : ./20220623110401-0.png
[2022/08/10 15:52:35] root INFO:     load_static_weights : True
[2022/08/10 15:52:35] root INFO:     log_smooth_window : 20
[2022/08/10 15:52:35] root INFO:     pretrained_model : PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained
[2022/08/10 15:52:35] root INFO:     print_batch_step : 10
[2022/08/10 15:52:35] root INFO:     save_epoch_step : 1200
[2022/08/10 15:52:35] root INFO:     save_inference_dir : None
[2022/08/10 15:52:35] root INFO:     save_model_dir : /home/aistudio/outputall/db_mv3/
[2022/08/10 15:52:35] root INFO:     save_res_path : ./outputall/det_db/predicts_db.txt
[2022/08/10 15:52:35] root INFO:     use_gpu : True
[2022/08/10 15:52:35] root INFO:     use_visualdl : False
[2022/08/10 15:52:35] root INFO: Loss : 
[2022/08/10 15:52:35] root INFO:     alpha : 5
[2022/08/10 15:52:35] root INFO:     balance_loss : True
[2022/08/10 15:52:35] root INFO:     beta : 10
[2022/08/10 15:52:35] root INFO:     main_loss_type : DiceLoss
[2022/08/10 15:52:35] root INFO:     name : DBLoss
[2022/08/10 15:52:35] root INFO:     ohem_ratio : 3
[2022/08/10 15:52:35] root INFO: Metric : 
[2022/08/10 15:52:35] root INFO:     main_indicator : hmean
[2022/08/10 15:52:35] root INFO:     name : DetMetric
[2022/08/10 15:52:35] root INFO: Optimizer : 
[2022/08/10 15:52:35] root INFO:     beta1 : 0.9
[2022/08/10 15:52:35] root INFO:     beta2 : 0.999
[2022/08/10 15:52:35] root INFO:     lr : 
[2022/08/10 15:52:35] root INFO:         learning_rate : 0.001
[2022/08/10 15:52:35] root INFO:     name : Adam
[2022/08/10 15:52:35] root INFO:     regularizer : 
[2022/08/10 15:52:35] root INFO:         factor : 0
[2022/08/10 15:52:35] root INFO:         name : L2
[2022/08/10 15:52:35] root INFO: PostProcess : 
[2022/08/10 15:52:35] root INFO:     box_thresh : 0.6
[2022/08/10 15:52:35] root INFO:     max_candidates : 1000
[2022/08/10 15:52:35] root INFO:     name : DBPostProcess
[2022/08/10 15:52:35] root INFO:     thresh : 0.3
[2022/08/10 15:52:35] root INFO:     unclip_ratio : 1.5
[2022/08/10 15:52:35] root INFO: Train : 
[2022/08/10 15:52:35] root INFO:     dataset : 
[2022/08/10 15:52:35] root INFO:         data_dir : ./report_ex/train_det_hebing_all
[2022/08/10 15:52:35] root INFO:         label_file_list : ['./train_det_hebing_all.txt']
[2022/08/10 15:52:35] root INFO:         name : SimpleDataSet
[2022/08/10 15:52:35] root INFO:         ratio_list : [1.0]
[2022/08/10 15:52:35] root INFO:         transforms : 
[2022/08/10 15:52:35] root INFO:             DecodeImage : 
[2022/08/10 15:52:35] root INFO:                 channel_first : False
[2022/08/10 15:52:35] root INFO:                 img_mode : BGR
[2022/08/10 15:52:35] root INFO:             DetLabelEncode : None
[2022/08/10 15:52:35] root INFO:             IaaAugment : 
[2022/08/10 15:52:35] root INFO:                 augmenter_args : 
[2022/08/10 15:52:35] root INFO:                     args : 
[2022/08/10 15:52:35] root INFO:                         p : 0.5
[2022/08/10 15:52:35] root INFO:                     type : Fliplr
[2022/08/10 15:52:35] root INFO:                     args : 
[2022/08/10 15:52:35] root INFO:                         rotate : [-10, 10]
[2022/08/10 15:52:35] root INFO:                     type : Affine
[2022/08/10 15:52:35] root INFO:                     args : 
[2022/08/10 15:52:35] root INFO:                         size : [0.5, 3]
[2022/08/10 15:52:35] root INFO:                     type : Resize
[2022/08/10 15:52:35] root INFO:             EastRandomCropData : 
[2022/08/10 15:52:35] root INFO:                 keep_ratio : True
[2022/08/10 15:52:35] root INFO:                 max_tries : 50
[2022/08/10 15:52:35] root INFO:                 size : [640, 640]
[2022/08/10 15:52:35] root INFO:             MakeBorderMap : 
[2022/08/10 15:52:35] root INFO:                 shrink_ratio : 0.4
[2022/08/10 15:52:35] root INFO:                 thresh_max : 0.7
[2022/08/10 15:52:35] root INFO:                 thresh_min : 0.3
[2022/08/10 15:52:35] root INFO:             MakeShrinkMap : 
[2022/08/10 15:52:35] root INFO:                 min_text_size : 8
[2022/08/10 15:52:35] root INFO:                 shrink_ratio : 0.4
[2022/08/10 15:52:35] root INFO:             NormalizeImage : 
[2022/08/10 15:52:35] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/08/10 15:52:35] root INFO:                 order : hwc
[2022/08/10 15:52:35] root INFO:                 scale : 1./255.
[2022/08/10 15:52:35] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/08/10 15:52:35] root INFO:             ToCHWImage : None
[2022/08/10 15:52:35] root INFO:             KeepKeys : 
[2022/08/10 15:52:35] root INFO:                 keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask']
[2022/08/10 15:52:35] root INFO:     loader : 
[2022/08/10 15:52:35] root INFO:         batch_size_per_card : 32
[2022/08/10 15:52:35] root INFO:         drop_last : False
[2022/08/10 15:52:35] root INFO:         num_workers : 8
[2022/08/10 15:52:35] root INFO:         shuffle : True
[2022/08/10 15:52:35] root INFO:         use_shared_memory : False
[2022/08/10 15:52:35] root INFO: train with paddle 2.0.2 and device CUDAPlace(0)
[2022/08/10 15:52:35] root INFO: Initialize indexs of datasets:['./train_det_hebing_all.txt']
[2022/08/10 15:52:35] root INFO: Initialize indexs of datasets:['./test_det_hebing_all.txt']

In [ ]

#2.合并全量数据集+Resnet检测模型训练
%cd /home/aistudio/
!python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db_all_resnet.yml -o \\
Global.eval_batch_step="[0,500]" \\
Global.load_static_weights=true \\
Global.checkpoints='/home/aistudio/outputall/db_resnet/best_accuracy' \\
Global.pretrained_model='PaddleOCR/pretrain_models/ResNet50_vd_ssld_pretrained' \\
Train.loader.batch_size_per_card=16 \\
Train.dataset.data_dir='./report_ex/train_det_hebing_all' \\
Train.dataset.label_file_list=['./train_det_hebing_all.txt'] \\
Eval.dataset.data_dir='./report_ex/test_det_hebing_all' \\
Eval.dataset.label_file_list=['./test_det_hebing_all.txt']

/home/aistudio
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
[2022/08/11 13:35:54] root INFO: Architecture : 
[2022/08/11 13:35:54] root INFO:     Backbone : 
[2022/08/11 13:35:54] root INFO:         model_name : large
[2022/08/11 13:35:54] root INFO:         name : ResNet
[2022/08/11 13:35:54] root INFO:         scale : 0.5
[2022/08/11 13:35:54] root INFO:     Head : 
[2022/08/11 13:35:54] root INFO:         k : 50
[2022/08/11 13:35:54] root INFO:         name : DBHead
[2022/08/11 13:35:54] root INFO:     Neck : 
[2022/08/11 13:35:54] root INFO:         name : DBFPN
[2022/08/11 13:35:54] root INFO:         out_channels : 256
[2022/08/11 13:35:54] root INFO:     Transform : None
[2022/08/11 13:35:54] root INFO:     algorithm : DB
[2022/08/11 13:35:54] root INFO:     model_type : det
[2022/08/11 13:35:54] root INFO: Eval : 
[2022/08/11 13:35:54] root INFO:     dataset : 
[2022/08/11 13:35:54] root INFO:         data_dir : ./report_ex/test_det_hebing_all
[2022/08/11 13:35:54] root INFO:         label_file_list : ['./test_det_hebing_all.txt']
[2022/08/11 13:35:54] root INFO:         name : SimpleDataSet
[2022/08/11 13:35:54] root INFO:         transforms : 
[2022/08/11 13:35:54] root INFO:             DecodeImage : 
[2022/08/11 13:35:54] root INFO:                 channel_first : False
[2022/08/11 13:35:54] root INFO:                 img_mode : BGR
[2022/08/11 13:35:54] root INFO:             DetLabelEncode : None
[2022/08/11 13:35:54] root INFO:             DetResizeForTest : 
[2022/08/11 13:35:54] root INFO:                 image_shape : [736, 1280]
[2022/08/11 13:35:54] root INFO:             NormalizeImage : 
[2022/08/11 13:35:54] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/08/11 13:35:54] root INFO:                 order : hwc
[2022/08/11 13:35:54] root INFO:                 scale : 1./255.
[2022/08/11 13:35:54] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/08/11 13:35:54] root INFO:             ToCHWImage : None
[2022/08/11 13:35:54] root INFO:             KeepKeys : 
[2022/08/11 13:35:54] root INFO:                 keep_keys : ['image', 'shape', 'polys', 'ignore_tags']
[2022/08/11 13:35:54] root INFO:     loader : 
[2022/08/11 13:35:54] root INFO:         batch_size_per_card : 1
[2022/08/11 13:35:54] root INFO:         drop_last : False
[2022/08/11 13:35:54] root INFO:         num_workers : 8
[2022/08/11 13:35:54] root INFO:         shuffle : False
[2022/08/11 13:35:54] root INFO:         use_shared_memory : False
[2022/08/11 13:35:54] root INFO: Global : 
[2022/08/11 13:35:54] root INFO:     cal_metric_during_train : False
[2022/08/11 13:35:54] root INFO:     checkpoints : /home/aistudio/outputall/db_resnet//best_accuracy
[2022/08/11 13:35:54] root INFO:     debug : False
[2022/08/11 13:35:54] root INFO:     distributed : False
[2022/08/11 13:35:54] root INFO:     epoch_num : 1200
[2022/08/11 13:35:54] root INFO:     eval_batch_step : [0, 500]
[2022/08/11 13:35:54] root INFO:     infer_img : ./20220623110401-0.png
[2022/08/11 13:35:54] root INFO:     load_static_weights : True
[2022/08/11 13:35:54] root INFO:     log_smooth_window : 20
[2022/08/11 13:35:54] root INFO:     pretrained_model : PaddleOCR/pretrain_models/ResNet50_vd_ssld_pretrained
[2022/08/11 13:35:54] root INFO:     print_batch_step : 10
[2022/08/11 13:35:54] root INFO:     save_epoch_step : 1200
[2022/08/11 13:35:54] root INFO:     save_inference_dir : None
[2022/08/11 13:35:54] root INFO:     save_model_dir : /home/aistudio/outputall/db_resnet/
[2022/08/11 13:35:54] root INFO:     save_res_path : ./outputall_resnet/det_db/predicts_db.txt
[2022/08/11 13:35:54] root INFO:     use_gpu : True
[2022/08/11 13:35:54] root INFO:     use_visualdl : False
[2022/08/11 13:35:54] root INFO: Loss : 
[2022/08/11 13:35:54] root INFO:     alpha : 5
[2022/08/11 13:35:54] root INFO:     balance_loss : True
[2022/08/11 13:35:54] root INFO:     beta : 10
[2022/08/11 13:35:54] root INFO:     main_loss_type : DiceLoss
[2022/08/11 13:35:54] root INFO:     name : DBLoss
[2022/08/11 13:35:54] root INFO:     ohem_ratio : 3
[2022/08/11 13:35:54] root INFO: Metric : 
[2022/08/11 13:35:54] root INFO:     main_indicator : hmean
[2022/08/11 13:35:54] root INFO:     name : DetMetric
[2022/08/11 13:35:54] root INFO: Optimizer : 
[2022/08/11 13:35:54] root INFO:     beta1 : 0.9
[2022/08/11 13:35:54] root INFO:     beta2 : 0.999
[2022/08/11 13:35:54] root INFO:     lr : 
[2022/08/11 13:35:54] root INFO:         learning_rate : 0.001
[2022/08/11 13:35:54] root INFO:     name : Adam
[2022/08/11 13:35:54] root INFO:     regularizer : 
[2022/08/11 13:35:54] root INFO:         factor : 0
[2022/08/11 13:35:54] root INFO:         name : L2
[2022/08/11 13:35:54] root INFO: PostProcess : 
[2022/08/11 13:35:54] root INFO:     box_thresh : 0.6
[2022/08/11 13:35:54] root INFO:     max_candidates : 1000
[2022/08/11 13:35:54] root INFO:     name : DBPostProcess
[2022/08/11 13:35:54] root INFO:     thresh : 0.3
[2022/08/11 13:35:54] root INFO:     unclip_ratio : 1.5
[2022/08/11 13:35:54] root INFO: Train : 
[2022/08/11 13:35:54] root INFO:     dataset : 
[2022/08/11 13:35:54] root INFO:         data_dir : ./report_ex/train_det_hebing_all
[2022/08/11 13:35:54] root INFO:         label_file_list : ['./train_det_hebing_all.txt']
[2022/08/11 13:35:54] root INFO:         name : SimpleDataSet
[2022/08/11 13:35:54] root INFO:         ratio_list : [1.0]
[2022/08/11 13:35:54] root INFO:         transforms : 
[2022/08/11 13:35:54] root INFO:             DecodeImage : 
[2022/08/11 13:35:54] root INFO:                 channel_first : False
[2022/08/11 13:35:54] root INFO:                 img_mode : BGR
[2022/08/11 13:35:54] root INFO:             DetLabelEncode : None
[2022/08/11 13:35:54] root INFO:             IaaAugment : 
[2022/08/11 13:35:54] root INFO:                 augmenter_args : 
[2022/08/11 13:35:54] root INFO:                     args : 
[2022/08/11 13:35:54] root INFO:                         p : 0.5
[2022/08/11 13:35:54] root INFO:                     type : Fliplr
[2022/08/11 13:35:54] root INFO:                     args : 
[2022/08/11 13:35:54] root INFO:                         rotate : [-10, 10]
[2022/08/11 13:35:54] root INFO:                     type : Affine
[2022/08/11 13:35:54] root INFO:                     args : 
[2022/08/11 13:35:54] root INFO:                         size : [0.5, 3]
[2022/08/11 13:35:54] root INFO:                     type : Resize
[2022/08/11 13:35:54] root INFO:             EastRandomCropData : 
[2022/08/11 13:35:54] root INFO:                 keep_ratio : True
[2022/08/11 13:35:54] root INFO:                 max_tries : 50
[2022/08/11 13:35:54] root INFO:                 size : [640, 640]
[2022/08/11 13:35:54] root INFO:             MakeBorderMap : 
[2022/08/11 13:35:54] root INFO:                 shrink_ratio : 0.4
[2022/08/11 13:35:54] root INFO:                 thresh_max : 0.7
[2022/08/11 13:35:54] root INFO:                 thresh_min : 0.3
[2022/08/11 13:35:54] root INFO:             MakeShrinkMap : 
[2022/08/11 13:35:54] root INFO:                 min_text_size : 8
[2022/08/11 13:35:54] root INFO:                 shrink_ratio : 0.4
[2022/08/11 13:35:54] root INFO:             NormalizeImage : 
[2022/08/11 13:35:54] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/08/11 13:35:54] root INFO:                 order : hwc
[2022/08/11 13:35:54] root INFO:                 scale : 1./255.
[2022/08/11 13:35:54] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/08/11 13:35:54] root INFO:             ToCHWImage : None
[2022/08/11 13:35:54] root INFO:             KeepKeys : 
[2022/08/11 13:35:54] root INFO:                 keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask']
[2022/08/11 13:35:54] root INFO:     loader : 
[2022/08/11 13:35:54] root INFO:         batch_size_per_card : 16
[2022/08/11 13:35:54] root INFO:         drop_last : False
[2022/08/11 13:35:54] root INFO:         num_workers : 8
[2022/08/11 13:35:54] root INFO:         shuffle : True
[2022/08/11 13:35:54] root INFO:         use_shared_memory : False
[2022/08/11 13:35:54] root INFO: train with paddle 2.0.2 and device CUDAPlace(0)
[2022/08/11 13:35:54] root INFO: Initialize indexs of datasets:['./train_det_hebing_all.txt']
[2022/08/11 13:35:54] root INFO: Initialize indexs of datasets:['./test_det_hebing_all.txt']
W0811 13:35:54.603739  1610 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0811 13:35:54.608341  1610 device_context.cc:372] device: 0, cuDNN Version: 7.6.

部分数据集训练

#3.合并后部分数据训练模版，下面显示输出是用这个训练的结果 
%cd /home/aistudio/
!python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db.yml -o \\
Global.eval_batch_step="[0,50]" \\
Global.load_static_weights=true \\
Global.pretrained_model='PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained' \\
Train.loader.batch_size_per_card=16 \\
Train.dataset.data_dir='./report_ex/train_det_new1_hebing' \\
Train.dataset.label_file_list=['./train_det_new1_hebing.txt'] \\
Eval.dataset.data_dir='./report_ex/test_det_new1_hebing' \\
Eval.dataset.label_file_list=['./test_det_new1_hebing.txt']
#3.合并后全集训练模版，可能要调batch_size_per_card大小 ,执行打开注释，注释其他，
# %cd /home/aistudio/
# !python3 PaddleOCR/tools/train.py -c PaddleOCR/configs/det/det_mv3_db.yml -o \\
# Global.eval_batch_step="[0,10]" \\
# Global.load_static_weights=true \\
# Global.pretrained_model='PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained' \\
# Train.loader.batch_size_per_card=32 \\
# Train.dataset.data_dir='./report_ex/train_det_hebing_all' \\
# Train.dataset.label_file_list=['./train_det_hebing_all.txt'] \\
# Eval.dataset.data_dir='./report_ex/test_det_hebing_all' \\
# Eval.dataset.label_file_list=['./test_det_hebing_all.txt']

/home/aistudio
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
[2022/08/09 09:38:47] root INFO: Architecture : 
[2022/08/09 09:38:47] root INFO:     Backbone : 
[2022/08/09 09:38:47] root INFO:         model_name : large
[2022/08/09 09:38:47] root INFO:         name : MobileNetV3
[2022/08/09 09:38:47] root INFO:         scale : 0.5
[2022/08/09 09:38:47] root INFO:     Head : 
[2022/08/09 09:38:47] root INFO:         k : 50
[2022/08/09 09:38:47] root INFO:         name : DBHead
[2022/08/09 09:38:47] root INFO:     Neck : 
[2022/08/09 09:38:47] root INFO:         name : DBFPN
[2022/08/09 09:38:47] root INFO:         out_channels : 256
[2022/08/09 09:38:47] root INFO:     Transform : None
[2022/08/09 09:38:47] root INFO:     algorithm : DB
[2022/08/09 09:38:47] root INFO:     model_type : det
[2022/08/09 09:38:47] root INFO: Eval : 
[2022/08/09 09:38:47] root INFO:     dataset : 
[2022/08/09 09:38:47] root INFO:         data_dir : ./report_ex/test_det_new1_hebing
[2022/08/09 09:38:47] root INFO:         label_file_list : ['./test_det_new1_hebing.txt']
[2022/08/09 09:38:47] root INFO:         name : SimpleDataSet
[2022/08/09 09:38:47] root INFO:         transforms : 
[2022/08/09 09:38:47] root INFO:             DecodeImage : 
[2022/08/09 09:38:47] root INFO:                 channel_first : False
[2022/08/09 09:38:47] root INFO:                 img_mode : BGR
[2022/08/09 09:38:47] root INFO:             DetLabelEncode : None
[2022/08/09 09:38:47] root INFO:             DetResizeForTest : 
[2022/08/09 09:38:47] root INFO:                 image_shape : [736, 1280]
[2022/08/09 09:38:47] root INFO:             NormalizeImage : 
[2022/08/09 09:38:47] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/08/09 09:38:47] root INFO:                 order : hwc
[2022/08/09 09:38:47] root INFO:                 scale : 1./255.
[2022/08/09 09:38:47] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/08/09 09:38:47] root INFO:             ToCHWImage : None
[2022/08/09 09:38:47] root INFO:             KeepKeys : 
[2022/08/09 09:38:47] root INFO:                 keep_keys : ['image', 'shape', 'polys', 'ignore_tags']
[2022/08/09 09:38:47] root INFO:     loader : 
[2022/08/09 09:38:47] root INFO:         batch_size_per_card : 1
[2022/08/09 09:38:47] root INFO:         drop_last : False
[2022/08/09 09:38:47] root INFO:         num_workers : 8
[2022/08/09 09:38:47] root INFO:         shuffle : False
[2022/08/09 09:38:47] root INFO:         use_shared_memory : False
[2022/08/09 09:38:47] root INFO: Global : 
[2022/08/09 09:38:47] root INFO:     cal_metric_during_train : False
[2022/08/09 09:38:47] root INFO:     checkpoints : None
[2022/08/09 09:38:47] root INFO:     debug : False
[2022/08/09 09:38:47] root INFO:     distributed : False
[2022/08/09 09:38:47] root INFO:     epoch_num : 1200
[2022/08/09 09:38:47] root INFO:     eval_batch_step : [0, 50]
[2022/08/09 09:38:47] root INFO:     infer_img : ./20220623110401-0.png
[2022/08/09 09:38:47] root INFO:     load_static_weights : True
[2022/08/09 09:38:47] root INFO:     log_smooth_window : 20
[2022/08/09 09:38:47] root INFO:     pretrained_model : PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained
[2022/08/09 09:38:47] root INFO:     print_batch_step : 10
[2022/08/09 09:38:47] root INFO:     save_epoch_step : 1200
[2022/08/09 09:38:47] root INFO:     save_inference_dir : None
[2022/08/09 09:38:47] root INFO:     save_model_dir : ./output1/db_mv3/
[2022/08/09 09:38:47] root INFO:     save_res_path : ./output1/det_db/predicts_db.txt
[2022/08/09 09:38:47] root INFO:     use_gpu : True
[2022/08/09 09:38:47] root INFO:     use_visualdl : False
[2022/08/09 09:38:47] root INFO: Loss : 
[2022/08/09 09:38:47] root INFO:     alpha : 5
[2022/08/09 09:38:47] root INFO:     balance_loss : True
[2022/08/09 09:38:47] root INFO:     beta : 10
[2022/08/09 09:38:47] root INFO:     main_loss_type : DiceLoss
[2022/08/09 09:38:47] root INFO:     name : DBLoss
[2022/08/09 09:38:47] root INFO:     ohem_ratio : 3
[2022/08/09 09:38:47] root INFO: Metric : 
[2022/08/09 09:38:47] root INFO:     main_indicator : hmean
[2022/08/09 09:38:47] root INFO:     name : DetMetric
[2022/08/09 09:38:47] root INFO: Optimizer : 
[2022/08/09 09:38:47] root INFO:     beta1 : 0.9
[2022/08/09 09:38:47] root INFO:     beta2 : 0.999
[2022/08/09 09:38:47] root INFO:     lr : 
[2022/08/09 09:38:47] root INFO:         learning_rate : 0.001
[2022/08/09 09:38:47] root INFO:     name : Adam
[2022/08/09 09:38:47] root INFO:     regularizer : 
[2022/08/09 09:38:47] root INFO:         factor : 0
[2022/08/09 09:38:47] root INFO:         name : L2
[2022/08/09 09:38:47] root INFO: PostProcess : 
[2022/08/09 09:38:47] root INFO:     box_thresh : 0.6
[2022/08/09 09:38:47] root INFO:     max_candidates : 1000
[2022/08/09 09:38:47] root INFO:     name : DBPostProcess
[2022/08/09 09:38:47] root INFO:     thresh : 0.3
[2022/08/09 09:38:47] root INFO:     unclip_ratio : 1.5
[2022/08/09 09:38:47] root INFO: Train : 
[2022/08/09 09:38:47] root INFO:     dataset : 
[2022/08/09 09:38:47] root INFO:         data_dir : ./report_ex/train_det_new1_hebing
[2022/08/09 09:38:47] root INFO:         label_file_list : ['./train_det_new1_hebing.txt']
[2022/08/09 09:38:47] root INFO:         name : SimpleDataSet
[2022/08/09 09:38:47] root INFO:         ratio_list : [1.0]
[2022/08/09 09:38:47] root INFO:         transforms : 
[2022/08/09 09:38:47] root INFO:             DecodeImage : 
[2022/08/09 09:38:47] root INFO:                 channel_first : False
[2022/08/09 09:38:47] root INFO:                 img_mode : BGR
[2022/08/09 09:38:47] root INFO:             DetLabelEncode : None
[2022/08/09 09:38:47] root INFO:             IaaAugment : 
[2022/08/09 09:38:47] root INFO:                 augmenter_args : 
[2022/08/09 09:38:47] root INFO:                     args : 
[2022/08/09 09:38:47] root INFO:                         p : 0.5
[2022/08/09 09:38:47] root INFO:                     type : Fliplr
[2022/08/09 09:38:47] root INFO:                     args : 
[2022/08/09 09:38:47] root INFO:                         rotate : [-10, 10]
[2022/08/09 09:38:47] root INFO:                     type : Affine
[2022/08/09 09:38:47] root INFO:                     args : 
[2022/08/09 09:38:47] root INFO:                         size : [0.5, 3]
[2022/08/09 09:38:47] root INFO:                     type : Resize
[2022/08/09 09:38:47] root INFO:             EastRandomCropData : 
[2022/08/09 09:38:47] root INFO:                 keep_ratio : True
[2022/08/09 09:38:47] root INFO:                 max_tries : 50
[2022/08/09 09:38:47] root INFO:                 size : [640, 640]
[2022/08/09 09:38:47] root INFO:             MakeBorderMap : 
[2022/08/09 09:38:47] root INFO:                 shrink_ratio : 0.4
[2022/08/09 09:38:47] root INFO:                 thresh_max : 0.7
[2022/08/09 09:38:47] root INFO:                 thresh_min : 0.3
[2022/08/09 09:38:47] root INFO:             MakeShrinkMap : 
[2022/08/09 09:38:47] root INFO:                 min_text_size : 8
[2022/08/09 09:38:47] root INFO:                 shrink_ratio : 0.4
[2022/08/09 09:38:47] root INFO:             NormalizeImage : 
[2022/08/09 09:38:47] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/08/09 09:38:47] root INFO:                 order : hwc
[2022/08/09 09:38:47] root INFO:                 scale : 1./255.
[2022/08/09 09:38:47] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/08/09 09:38:47] root INFO:             ToCHWImage : None
[2022/08/09 09:38:47] root INFO:             KeepKeys : 
[2022/08/09 09:38:47] root INFO:                 keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask']
[2022/08/09 09:38:47] root INFO:     loader : 
[2022/08/09 09:38:47] root INFO:         batch_size_per_card : 16
[2022/08/09 09:38:47] root INFO:         drop_last : False
[2022/08/09 09:38:47] root INFO:         num_workers : 8
[2022/08/09 09:38:47] root INFO:         shuffle : True
[2022/08/09 09:38:47] root INFO:         use_shared_memory : False
[2022/08/09 09:38:47] root INFO: train with paddle 2.0.2 and device CUDAPlace(0)
[2022/08/09 09:38:47] root INFO: Initialize indexs of datasets:['./train_det_new1_hebing.txt']
[2022/08/09 09:38:47] root INFO: Initialize indexs of datasets:['./test_det_new1_hebing.txt']
W0809 09:38:47.257441 10327 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0809 09:38:47.261169 10327 device_context.cc:372] device: 0, cuDNN Version: 7.6.
[2022/08/09 09:38:50] root INFO: load pretrained model from ['PaddleOCR/pretrain_models/MobileNetV3_large_x0_5_pretrained']
[2022/08/09 09:38:50] root INFO: train dataloader has 9 iters
[2022/08/09 09:38:50] root INFO: valid dataloader has 16 iters
[2022/08/09 09:38:50] root INFO: During the training process, after the 0th iteration, an evaluation is run every 50 iterations
[2022/08/09 09:38:50] root INFO: Initialize indexs of datasets:['./train_det_new1_hebing.txt']
[2022/08/09 09:39:51] root INFO: epoch: [1/1200], iter: 8, lr: 0.001000, loss: 7.751804, loss_shrink_maps: 4.606901, loss_threshold_maps: 2.225027, loss_binary_maps: 0.919876, reader_cost: 5.55701 s, batch_cost: 6.10987 s, samples: 140, ips: 2.29137
[2022/08/09 09:39:52] root INFO: save model in ./output1/db_mv3/latest
[2022/08/09 09:39:52] root INFO: Initialize indexs of datasets:['./train_det_new1_hebing.txt']
[2022/08/09 09:40:55] root INFO: epoch: [2/1200], iter: 10, lr: 0.001000, loss: 7.377272, loss_shrink_maps: 4.554792, loss_threshold_maps: 1.911745, loss_binary_maps: 0.910735, reader_cost: 6.11302 s, batch_cost: 6.33029 s, samples: 32, ips: 0.50551
[2022/08/09 09:40:58] root INFO: epoch: [2/1200], iter: 17, lr: 0.001000, loss: 6.610305, loss_shrink_maps: 4.466334, loss_threshold_maps: 1.255741, loss_binary_maps: 0.897122, reader_cost: 0.07724 s, batch_cost: 0.30940 s, samples: 108, ips: 34.90611
[2022/08/09 09:40:59] root INFO: save model in ./output1/db_mv3/latest
[2022/08/09 09:40:59] root INFO: Initialize indexs of datasets:['./train_det_new1_hebing.txt']
^C
main proc 11724 exit, kill process group 10327
main proc 11723 exit, kill process group 10327

2.7测试检测效果

训练过程中的检测模型保存在'./output/det_db/'中，模型保存的位置通过yml配置文件的Global.save_model_dir参数设置。

使用训练好的模型测试单张图像的检测效果。

In [ ]

# %cd PaddleOCR
#部分数据结果
# !python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="../20220623110401-0.png" Global.pretrained_model="/home/aistudio/output1/db_mv3/best_accuracy"
#全集数据结果，全集只训练一个epoch
!python3 tools/infer_det.py -c configs/det/det_mv3_db_all.yml -o Global.infer_img="../20220623110401-0.png" Global.pretrained_model="/home/aistudio/outputall/db_mv3/best_accuracy"

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):
[2022/08/17 22:49:37] root INFO: Architecture : 
[2022/08/17 22:49:37] root INFO:     Backbone : 
[2022/08/17 22:49:37] root INFO:         model_name : large
[2022/08/17 22:49:37] root INFO:         name : MobileNetV3
[2022/08/17 22:49:37] root INFO:         scale : 0.5
[2022/08/17 22:49:37] root INFO:     Head : 
[2022/08/17 22:49:37] root INFO:         k : 50
[2022/08/17 22:49:37] root INFO:         name : DBHead
[2022/08/17 22:49:37] root INFO:     Neck : 
[2022/08/17 22:49:37] root INFO:         name : DBFPN
[2022/08/17 22:49:37] root INFO:         out_channels : 256
[2022/08/17 22:49:37] root INFO:     Transform : None
[2022/08/17 22:49:37] root INFO:     algorithm : DB
[2022/08/17 22:49:37] root INFO:     model_type : det
[2022/08/17 22:49:37] root INFO: Eval : 
[2022/08/17 22:49:37] root INFO:     dataset : 
[2022/08/17 22:49:37] root INFO:         data_dir : ./train_data/icdar2015/text_localization/
[2022/08/17 22:49:37] root INFO:         label_file_list : ['./train_data/icdar2015/text_localization/test_icdar2015_label.txt']
[2022/08/17 22:49:37] root INFO:         name : SimpleDataSet
[2022/08/17 22:49:37] root INFO:         transforms : 
[2022/08/17 22:49:37] root INFO:             DecodeImage : 
[2022/08/17 22:49:37] root INFO:                 channel_first : False
[2022/08/17 22:49:37] root INFO:                 img_mode : BGR
[2022/08/17 22:49:37] root INFO:             DetLabelEncode : None
[2022/08/17 22:49:37] root INFO:             DetResizeForTest : 
[2022/08/17 22:49:37] root INFO:                 image_shape : [736, 1280]
[2022/08/17 22:49:37] root INFO:             NormalizeImage : 
[2022/08/17 22:49:37] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/08/17 22:49:37] root INFO:                 order : hwc
[2022/08/17 22:49:37] root INFO:                 scale : 1./255.
[2022/08/17 22:49:37] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/08/17 22:49:37] root INFO:             ToCHWImage : None
[2022/08/17 22:49:37] root INFO:             KeepKeys : 
[2022/08/17 22:49:37] root INFO:                 keep_keys : ['image', 'shape', 'polys', 'ignore_tags']
[2022/08/17 22:49:37] root INFO:     loader : 
[2022/08/17 22:49:37] root INFO:         batch_size_per_card : 1
[2022/08/17 22:49:37] root INFO:         drop_last : False
[2022/08/17 22:49:37] root INFO:         num_workers : 8
[2022/08/17 22:49:37] root INFO:         shuffle : False
[2022/08/17 22:49:37] root INFO:         use_shared_memory : False
[2022/08/17 22:49:37] root INFO: Global : 
[2022/08/17 22:49:37] root INFO:     cal_metric_during_train : False
[2022/08/17 22:49:37] root INFO:     checkpoints : None
[2022/08/17 22:49:37] root INFO:     debug : False
[2022/08/17 22:49:37] root INFO:     distributed : False
[2022/08/17 22:49:37] root INFO:     epoch_num : 1200
[2022/08/17 22:49:37] root INFO:     eval_batch_step : [0, 2000]
[2022/08/17 22:49:37] root INFO:     infer_img : ../20220623110401-0.png
[2022/08/17 22:49:37] root INFO:     log_smooth_window : 20
[2022/08/17 22:49:37] root INFO:     pretrained_model : /home/aistudio/outputall/db_mv3/best_accuracy
[2022/08/17 22:49:37] root INFO:     print_batch_step : 10
[2022/08/17 22:49:37] root INFO:     save_epoch_step : 1200
[2022/08/17 22:49:37] root INFO:     save_inference_dir : None
[2022/08/17 22:49:37] root INFO:     save_model_dir : /home/aistudio/outputall/db_mv3/
[2022/08/17 22:49:37] root INFO:     save_res_path : ./outputall/det_db/predicts_db.txt
[2022/08/17 22:49:37] root INFO:     use_gpu : True
[2022/08/17 22:49:37] root INFO:     use_visualdl : False
[2022/08/17 22:49:37] root INFO: Loss : 
[2022/08/17 22:49:37] root INFO:     alpha : 5
[2022/08/17 22:49:37] root INFO:     balance_loss : True
[2022/08/17 22:49:37] root INFO:     beta : 10
[2022/08/17 22:49:37] root INFO:     main_loss_type : DiceLoss
[2022/08/17 22:49:37] root INFO:     name : DBLoss
[2022/08/17 22:49:37] root INFO:     ohem_ratio : 3
[2022/08/17 22:49:37] root INFO: Metric : 
[2022/08/17 22:49:37] root INFO:     main_indicator : hmean
[2022/08/17 22:49:37] root INFO:     name : DetMetric
[2022/08/17 22:49:37] root INFO: Optimizer : 
[2022/08/17 22:49:37] root INFO:     beta1 : 0.9
[2022/08/17 22:49:37] root INFO:     beta2 : 0.999
[2022/08/17 22:49:37] root INFO:     lr : 
[2022/08/17 22:49:37] root INFO:         learning_rate : 0.001
[2022/08/17 22:49:37] root INFO:     name : Adam
[2022/08/17 22:49:37] root INFO:     regularizer : 
[2022/08/17 22:49:37] root INFO:         factor : 0
[2022/08/17 22:49:37] root INFO:         name : L2
[2022/08/17 22:49:37] root INFO: PostProcess : 
[2022/08/17 22:49:37] root INFO:     box_thresh : 0.6
[2022/08/17 22:49:37] root INFO:     max_candidates : 1000
[2022/08/17 22:49:37] root INFO:     name : DBPostProcess
[2022/08/17 22:49:37] root INFO:     thresh : 0.3
[2022/08/17 22:49:37] root INFO:     unclip_ratio : 1.5
[2022/08/17 22:49:37] root INFO: Train : 
[2022/08/17 22:49:37] root INFO:     dataset : 
[2022/08/17 22:49:37] root INFO:         data_dir : ./train_data/icdar2015/text_localization/
[2022/08/17 22:49:37] root INFO:         label_file_list : ['./train_data/icdar2015/text_localization/train_icdar2015_label.txt']
[2022/08/17 22:49:37] root INFO:         name : SimpleDataSet
[2022/08/17 22:49:37] root INFO:         ratio_list : [1.0]
[2022/08/17 22:49:37] root INFO:         transforms : 
[2022/08/17 22:49:37] root INFO:             DecodeImage : 
[2022/08/17 22:49:37] root INFO:                 channel_first : False
[2022/08/17 22:49:37] root INFO:                 img_mode : BGR
[2022/08/17 22:49:37] root INFO:             DetLabelEncode : None
[2022/08/17 22:49:37] root INFO:             IaaAugment : 
[2022/08/17 22:49:37] root INFO:                 augmenter_args : 
[2022/08/17 22:49:37] root INFO:                     args : 
[2022/08/17 22:49:37] root INFO:                         p : 0.5
[2022/08/17 22:49:37] root INFO:                     type : Fliplr
[2022/08/17 22:49:37] root INFO:                     args : 
[2022/08/17 22:49:37] root INFO:                         rotate : [-10, 10]
[2022/08/17 22:49:37] root INFO:                     type : Affine
[2022/08/17 22:49:37] root INFO:                     args : 
[2022/08/17 22:49:37] root INFO:                         size : [0.5, 3]
[2022/08/17 22:49:37] root INFO:                     type : Resize
[2022/08/17 22:49:37] root INFO:             EastRandomCropData : 
[2022/08/17 22:49:37] root INFO:                 keep_ratio : True
[2022/08/17 22:49:37] root INFO:                 max_tries : 50
[2022/08/17 22:49:37] root INFO:                 size : [640, 640]
[2022/08/17 22:49:37] root INFO:             MakeBorderMap : 
[2022/08/17 22:49:37] root INFO:                 shrink_ratio : 0.4
[2022/08/17 22:49:37] root INFO:                 thresh_max : 0.7
[2022/08/17 22:49:37] root INFO:                 thresh_min : 0.3
[2022/08/17 22:49:37] root INFO:             MakeShrinkMap : 
[2022/08/17 22:49:37] root INFO:                 min_text_size : 8
[2022/08/17 22:49:37] root INFO:                 shrink_ratio : 0.4
[2022/08/17 22:49:37] root INFO:             NormalizeImage : 
[2022/08/17 22:49:37] root INFO:                 mean : [0.485, 0.456, 0.406]
[2022/08/17 22:49:37] root INFO:                 order : hwc
[2022/08/17 22:49:37] root INFO:                 scale : 1./255.
[2022/08/17 22:49:37] root INFO:                 std : [0.229, 0.224, 0.225]
[2022/08/17 22:49:37] root INFO:             ToCHWImage : None
[2022/08/17 22:49:37] root INFO:             KeepKeys : 
[2022/08/17 22:49:37] root INFO:                 keep_keys : ['image', 'threshold_map', 'threshold_mask', 'shrink_map', 'shrink_mask']
[2022/08/17 22:49:37] root INFO:     loader : 
[2022/08/17 22:49:37] root INFO:         batch_size_per_card : 64
[2022/08/17 22:49:37] root INFO:         drop_last : False
[2022/08/17 22:49:37] root INFO:         num_workers : 8
[2022/08/17 22:49:37] root INFO:         shuffle : True
[2022/08/17 22:49:37] root INFO:         use_shared_memory : False
[2022/08/17 22:49:37] root INFO: train with paddle 2.0.2 and device CUDAPlace(0)
W0817 22:49:37.830164  5900 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0817 22:49:37.835045  5900 device_context.cc:372] device: 0, cuDNN Version: 7.6.
[2022/08/17 22:49:40] root INFO: load pretrained model from ['/home/aistudio/outputall/db_mv3/best_accuracy']
[2022/08/17 22:49:40] root INFO: infer_img: ../20220623110401-0.png
[2022/08/17 22:49:41] root INFO: The detected Image saved in ./outputall/det_db/det_results/20220623110401-0.png
[2022/08/17 22:49:41] root INFO: success!

In [ ]

# %cd PaddleOCR/
# !python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="../20220623110401-0.png"  Global.checkpoints="./output/db_mv3/best_accuracy"

使用训练好的模型，测试文件夹下所有图像的检测效果,路径按需更改！下同。

In [ ]

#!python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/"  Global.checkpoints="./output/db_mv3/best_accuracy"

3. 训练文字识别模型

3.1. 数据准备

首先请将训练图片放入同一个文件夹（trainimages），并用一个txt文件（recgttrain.txt）记录图片路径和标签。

本项目识别使用的数据集： train_reg.txt 、test_reg.txt ，注意：默认请将图片路径和图片标签用 \\t 分割。

txt训练集

" 图像文件名                 图像标注信息 "

  20220623110401-0.png   姓名：张某某

训练集及测试集文件结构如下：

    |- train_reg.txt
    |- report_ex/
    	|- train_reg
           |- word_001.png
           |- word_002.jpg
           | ...
    	|- test_reg
           |- word_001.png
           |- word_002.jpg
           | ...

### 3.2. 快速启动训练

本节文字识别网络以 CRNN 识别模型为例，网络模型使用PaddleOCR主流两种识别模型backbone，MobileNetV3和ResNet50_vd ：

In [ ]

# 下载ResNet50的预训练模型
%cd PaddleOCR/
!wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar
! cd pretrain_models/ && tar xf rec_mv3_none_bilstm_ctc_v2.0_train.tar

/home/aistudio/PaddleOCR
--2022-08-05 14:11:03--  https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mv3_none_bilstm_ctc_v2.0_train.tar
Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 100.67.200.6
Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|100.67.200.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 51200000 (49M) [application/x-tar]
Saving to: ‘./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train.tar.1’

rec_mv3_none_bilstm 100%[===================>]  48.83M   119MB/s    in 0.4s    

2022-08-05 14:11:03 (119 MB/s) - ‘./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train.tar.1’ saved [51200000/51200000]

In [12]

#rec.py为按坐标截取图片脚本，将原来一张图片的所有标注截取成多张，生成new_pngs图像文件夹并生成对应rec.txt，用于识别模型
%cd /home/aistudio/
!python ./rec.py

In [ ]

#查看当前文件夹图片数量
%cd ./new_pngs
!ls -l | grep "^-" | wc -l   #一共1490577张图片

/home/aistudio/new_pngs
1492727

In [ ]

#切分训练与测试数据集
%cd /home/aistudio/
!python ./rec_split_data.py

/home/aistudio
2150

In [ ]

#拷贝训练集与测试集对应图片到文件夹用于文字识别训练，执行方法同上,需要执行两次，一次 train ,一次test
!python rec_file.py

3.文字识别训练

本次文字识别训练因为服务器内存原因，只跑小部分数据集，模型只做基础演示，后续可调节参数或更换网络模型进行训练。

In [ ]

%cd PaddleOCR/
!python3 ./tools/train.py -c ./configs/rec/rec_icdar15_train.yml -o \\
Global.eval_batch_step="[0,100]" \\
Global.save_epoch_step=500 \\
Global.pretrained_model='./pretrain_models/rec_mv3_none_bilstm_ctc_v2.0_train/best_accuracy' \\
Train.dataset.data_dir='../report_ex/train_rec' \\
Train.dataset.label_file_list=['../train_rec.txt'] \\
Eval.dataset.data_dir='../report_ex/test_rec' \\
Eval.dataset.label_file_list=['../test_rec.txt'] \\
Optimizer.lr.learning_rate=0.001

[Errno 2] No such file or directory: 'PaddleOCR//'
/home/aistudio/PaddleOCR
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list以上是关于基于PaddleOCR的体检报告识别的主要内容，如果未能解决你的问题，请参考以下文章