Google Vision OCR，将文字坐标从 90、180、270 个文档中旋转到 0 度

Posted 2023-04-17

技术标签:

【中文标题】Google Vision OCR，将文字坐标从 90、180、270 个文档中旋转到 0 度【英文标题】：Google Vision OCR, rotate words coordinates to 0 degrees from 90, 180, 270 documents 【发布时间】：2021-02-24 09:32:22 【问题描述】：

问题

鉴于我们有以下指南，取自 Google Vision OCR 文档https://developers.google.com/resources/api-libraries/documentation/vision/v1p1beta1/python/latest/vision_v1p1beta1.files.html

                  "boundingBox":  # A bounding polygon for the detected image annotation. # The bounding box for the paragraph.
                      # The vertices are in the order of top-left, top-right, bottom-right,
                      # bottom-left. When a rotation of the bounding box is detected the rotation
                      # is represented as around the top-left corner as defined when the text is
                      # read in the 'natural' orientation.
                      # For example:
                      #   * when the text is horizontal it might look like:
                      #      0----1
                      #      |    |
                      #      3----2
                      #   * when it's rotated 180 degrees around the top-left corner it becomes:
                      #      2----3
                      #      |    |
                      #      1----0
                      #   and the vertex order will still be (0, 1, 2, 3).

因此，作为一项实验，我以四个不同的方向扫描了同一个文档，并通过 Google 的 Vision OCR (DOCUMENT_TEXT_DETECTION) 运行它。即0、90、180和270度。这给出了来自 Google 的 OCR 输出的以下结果。

文档的方向为 0 度。 这是默认的水平文本。它具有 0 度文本旋转。它的四个角是：

0----1
|    |
3----2
Document height 3508
Document width 2479

输出文本示例

LEGO - 'vertices': ['x': 755, 'y': 172, 'x': 877, 'y': 173, 'x': 876, 'y': 237, 'x': 754, 'y': 236]
LEGOLAND - 'vertices': ['x': 1994, 'y': 189, 'x': 2269, 'y': 192, 'x': 2268, 'y': 244, 'x': 1993, 'y': 241]

90 度方向的文档。

1----2
|    |
0----3
*vertex order will still be (0, 1, 2, 3)
Document height 2479
Document width 3508

输出文本示例

LEGO - 'vertices': ['x': 170, 'y': 1730, 'x': 171, 'y': 1604, 'x': 241, 'y': 1604, 'x': 240, 'y': 1730]
LEGOLAND - 'vertices': ['x': 188, 'y': 486, 'x': 192, 'y': 213, 'x': 245, 'y': 214, 'x': 241, 'y': 487]

180 度方向的文档。

2----3
|    |
1----0
*vertex order will still be (0, 1, 2, 3)
Document height 3508
Document width 2479

输出文本示例

LEGO - 'vertices': ['x': 1740, 'y': 3337, 'x': 1584, 'y': 3336, 'x': 1585, 'y': 3259, 'x': 1741, 'y': 3260]
LEGOLAND - 'vertices': ['x': 485, 'y': 3315, 'x': 212, 'y': 3311, 'x': 213, 'y': 3261, 'x': 486, 'y': 3265]

270 度方向的文档。

3----0
|    |
2----1
*vertex order will still be (0, 1, 2, 3)
Document height 2479
Document width 3508

输出文本示例

LEGO - 'vertices': ['x': 3335, 'y': 738, 'x': 3333, 'y': 893, 'x': 3269, 'y': 892, 'x': 3271, 'y': 737]
LEGOLAND - 'vertices': ['x': 3318, 'y': 1994, 'x': 3313, 'y': 2266, 'x': 3261, 'y': 2265, 'x': 3266, 'y': 1993]

现在的问题/问题

假设我们有一个以 90、180 和 270 度扫描的文档，如何在数学上旋转坐标，以便无论以哪个方向扫描，它们都给出与默认 0 度文档相同的结果。或者换句话说，如何将90度、180度和270度的坐标像用0度扫描一样校正？

这个问题对某些人来说可能看起来很简单，但过去几天我一直在尝试各种方法，但我似乎无法弄清楚。

所以输入参数是扫描的页面方向度（0,90,180,270），谷歌OCR输出的文本顶点和谷歌OCR的页面大小（高度和宽度）。

输出必须是 0 度页面方向的更正文本顶点

【问题讨论】：

【参考方案1】：

我会给你数学答案。请记住，数学是一门精确的科学，而 Vision OCR 扫描是一种经验技术，即不是精确的科学。

请允许我举一个简单的例子，以便您了解行为。想象一个高度为 10、宽度为 4 的文档，其中一个点位于坐标 (1,9)。当您将其旋转 90º 时，该点的坐标变为 (9,3)，然后变为 (3,1)，最后变为 (1,1)。

原因是对于高度 H 和宽度 W 的通用矩形，点 (a,b) 旋转 90º 会产生: (a,b) -> (b,Wa) , W' = H , H' = W.

此变换重复产生 180º、270º 变换。即序列 (a,b) -> (b,W-a) -> (W-a,H-b) -> (H-b, a) -> (a,b)

因此，如果您知道所有参数，则从序列中的任何点将其转回 (a,b) 只是一个简单的等式。

例如，对于 180 度边界框：

LEGO - 'vertices': ['x': 1740, 'y': 3337, 'x': 1584, 'y': 3336, 'x': 1585, 'y': 3259, 'x': 1741, 'y': 3260]

每个 x 值都遵循 x = width-x0 -> x0 = width - x 每个 y 值都遵循 y = height-y0 -> y0 = height - y

这给出了：

LEGO - 'vertices': ['x': 739, 'y': 171, 'x': 895, 'y': 172, 'x': 894, 'y': 249, 'x': 738, 'y': 248]

当然，这与您的原始值略有不同。如果您对所有旋转执行简单的转换，您会发现它们在所有旋转中都略有不同。请记住，这是经验性的“边界框”，它们有一个相关的错误，并且不可能像处理“数学”问题那样让它们完全相同。

【讨论】：

以上是关于Google Vision OCR，将文字坐标从 90、180、270 个文档中旋转到 0 度的主要内容，如果未能解决你的问题，请参考以下文章

使用 Google Vision API 进行 OCR 扫描的地图

使用 google vision OCR API 从特定图像位置提取数据

来自 Google Vision API 的 OCR 置信度得分

获得对 Google Vision OCR 文本注释结果的信心

如何使用 Google Vision OCR On-Premise？ [关闭]

使用 Google Cloud Vision 的 OCR PDF 文件？