谷歌 OCR 在特定领域工作

Posted

技术标签:

【中文标题】谷歌 OCR 在特定领域工作【英文标题】:Google OCR working on specific area 【发布时间】:2017-12-08 16:16:43 【问题描述】:

我目前正在使用来自com.google.android.gms.visionSurfaceViewCameraSource 来捕获图像上检测到的文本,但是由于它捕获了SurfaceView 区域上的所有内容,因此我需要丢弃一些恢复的内容。

目标是让SurfaceView 像下一张图片一样工作,忽略所有检测到的红叉区域中的文本,只给我蓝色方块上的东西。

这可能吗?

这是布局(没什么特别的):

<?xml version="1.0" encoding="utf-8"?>
<android.support.constraint.ConstraintLayout
    xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    android:layout_
    android:layout_>

    <SurfaceView
        android:id="@+id/fragment_surface"
        android:layout_
        android:layout_
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintLeft_toLeftOf="parent"
        app:layout_constraintRight_toRightOf="parent"
        app:layout_constraintTop_toTopOf="parent" />
</android.support.constraint.ConstraintLayout>

这里有关于类的 OCR 相关代码:

public class CameraActivity extends AppCompatActivity 

    private SurfaceView surfaceView;
    private CameraSource cameraSource;
    private StringBuilder builder;

    @Override
    protected void onCreate(Bundle savedInstanceState) 
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_camera);

        surfaceView = (SurfaceView) findViewById(R.id.fragment_surface);

        TextRecognizer recognizer = new TextRecognizer.Builder(getApplicationContext()).build();
        if (recognizer.isOperational()) 

            cameraSource = new CameraSource.Builder(getApplicationContext(), recognizer)
                    .setFacing(CameraSource.CAMERA_FACING_BACK)
                    .setRequestedPreviewSize(1280, 1024)
                    .setRequestedFps(15.0f)
                    .setAutoFocusEnabled(true)
                    .build();

            surfaceView.getHolder().addCallback(new SurfaceHolder.Callback() 
                @Override
                public void surfaceCreated(SurfaceHolder holder) 
                    if (ActivityCompat.checkSelfPermission(getApplicationContext(), Manifest.permission.CAMERA) != PackageManager.PERMISSION_GRANTED) 
                        ActivityCompat.requestPermissions(CameraActivity.this, new String[]Manifest.permission.CAMERA, 100);
                        return;
                    
                    try 
                        cameraSource.start(surfaceView.getHolder());
                     catch (IOException e) 
                        e.printStackTrace();
                    
                

                @Override
                public void surfaceChanged(SurfaceHolder holder, int format, int width, int height) 
                    //
                

                @Override
                public void surfaceDestroyed(SurfaceHolder holder) 
                    cameraSource.stop();
                
            );
            recognizer.setProcessor(new Detector.Processor<TextBlock>() 
                @Override
                public void release() 
                    //
                

                @Override
                public void receiveDetections(Detector.Detections<TextBlock> detections) 
                    final SparseArray<TextBlock> items = detections.getDetectedItems();
                    if (items.size() != 0) 
                        builder = new StringBuilder();
                        for (int i = 0; i < items.size(); i++) 
                            TextBlock it = items.valueAt(i);
                            builder.append(it.getValue());
                        
                        String read = builder.toString().trim().replace(" ", "").replace("\n", "");

                        //It continues doing other things here
                    
                
            );
        
    

    @Override
    public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) 
        switch (requestCode) 
            case 100:
                if (grantResults[0] == PackageManager.PERMISSION_GRANTED) 
                    try 
                        if (ActivityCompat.checkSelfPermission(getApplicationContext(), Manifest.permission.CAMERA) != PackageManager.PERMISSION_GRANTED) 
                            return;
                        
                        cameraSource.start(surfaceView.getHolder());
                     catch (IOException e) 
                        e.printStackTrace();
                    
                
                break;
        
    

【问题讨论】:

你弄明白了吗?我也面临同样的问题 对不起,但不是这种方法。我们最终将它与 OpenCV 混合在一起。您可以使用该工具选择一个确切的区域并在其上使用 OCR。 啊,太糟糕了 【参考方案1】:

当设备处于纵向模式(如您的图片中)时,非红色区域应该是相机预览的裁剪部分(填满整个屏幕),因此:

如果您想显示整个相机预览并仅在裁剪区域上执行 OCR:那么您必须获取 SurfaceView 的“屏幕截图”(完整区域),然后裁剪该区域以获得所需的即时像素李> intead,如果您只想显示裁剪区域(因为红色区域充满了其他界面、按钮、TextView 等):那么您必须使用 SurfaceView 来仅渲染所需的部分通过使用 Matrix 参数在该特定位置进行相机预览

我建议您“升级”到 TextureView,这有点难以管理/使用,但允许通过在其内部纹理上使用矩阵来根据需要裁剪、缩放和缩放预览。

【讨论】:

【参考方案2】:

简介

我正在尝试对您现有的代码进行最小的更改,因此只需像现在一样扫描您的整个图像,并过滤掉超出范围的结果单词(或块)。

代码

使用rect.intersect 查找wordbounding-box 并查看它是否在您的rectangle (rect) 中:

Rect yourRect = new Rect(10, 20, 30, 40);
rect.intersect(yourRect);//also see Drawable d = d.getBounds();

尝试将此代码添加到您的 @Override public void receiveDetections() 方法中:

        //Loop through each `Block`
        foreach (TextBlock textBlock in blocks)
        
            IList<IText> textLines = textBlock.Components; 

            //loop Through each `Line`
            foreach (IText currentLine in textLines)
            
                IList<IText>  words = currentLine.Components;

                //Loop through each `Word`
                foreach (IText currentword in words)
                
                    //Get the Rectangle/BoundingBox of the word
                    RectF rect = new RectF(currentword.BoundingBox);

                   // Check if the word boundingBox is inside the area required

                  // using: rect.intersect(yourRect);
                  //...
                

所以它看起来像这样:

        recognizer.setProcessor(new Detector.Processor<TextBlock>() 
            @Override
            public void release() 
                //
            

            @Override
            public void receiveDetections(Detector.Detections<TextBlock> detections) 
                final SparseArray<TextBlock> items = detections.getDetectedItems();
                if (items.size() != 0) 
                    builder = new StringBuilder();
                    for (int i = 0; i < items.size(); i++) 
                        TextBlock it = items.valueAt(i);
                        builder.append(it.getValue());
                    
                    String read = builder.toString().trim().replace(" ", "").replace("\n", "");

        List<TextBlock> blocks = new List<TextBlock>();

        TextBlock myItem = null;
        for (int i = 0; i < items.Size(); ++i)
        
            myItem = (TextBlock)items.ValueAt(i);

            //Add All TextBlocks to the `blocks` List
            blocks.Add(myItem);

        

        //Loop through each `Block`
        foreach (TextBlock textBlock in blocks)
        
            IList<IText> textLines = textBlock.Components; 

            //loop Through each `Line`
            foreach (IText currentLine in textLines)
            
                IList<IText>  words = currentLine.Components;

                //Loop through each `Word`
                foreach (IText currentword in words)
                
                    //Get the Rectangle/BoundingBox of the word
                    RectF rect = new RectF(currentword.BoundingBox);

                   // Check if the word boundingBox is inside the area required

                  // using: rect.intersect(yourRect);
                  //put the word in a filtered list...
                

            

                    //It continues doing other things here
                
            
        );

只有六行代码!

【讨论】:

以上是关于谷歌 OCR 在特定领域工作的主要内容,如果未能解决你的问题,请参考以下文章

关于OCR在智能无人值守称重领域的应用简介

喜报平安科技OCR团队勇夺图像识别领域评测比赛COCO Text世界第一!

Tesseract OCR集成Android Studio实现OCR识别

opencv +数字识别

OCR文字识别技术总结

招募TensorFlow领域的Google开发技术专家