Tesseract 找不到经过训练的数据文件

Posted

技术标签:

【中文标题】Tesseract 找不到经过训练的数据文件【英文标题】:Tesseract Couldn't find trained data file 【发布时间】:2014-07-20 11:01:03 【问题描述】:

我已将 eng.traineddata 文件正确包含到项目中。它工作正常。突然之间,它开始给我以下错误和崩溃。

打开数据文件时出错 /var/mobile/Applications/B36E2682-933F-4B12-9B32-4C3F640BE19E/Documents/tessdata/eng.traineddata 请确保 TESSDATA_PREFIX 环境变量设置为 “tessdata”目录的父目录。加载失败 语言“eng”Tesseract 无法加载任何语言!

我使用的代码

- (NSString*) pathToLangugeFIle


// Set up the tessdata path. This is included in the application bundle
// but is copied to the Documents directory on the first run.
NSArray *documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentPath = ([documentPaths count] > 0) ? [documentPaths objectAtIndex:0] : nil;

NSString *dataPath = [documentPath stringByAppendingPathComponent:@"tessdata"];
NSFileManager *fileManager = [NSFileManager defaultManager];
// If the expected store doesn't exist, copy the default store.
if (![fileManager fileExistsAtPath:dataPath]) 
    // get the path to the app bundle (with the tessdata dir)
    NSString *bundlePath = [[NSBundle mainBundle] bundlePath];
    NSString *tessdataPath = [bundlePath stringByAppendingPathComponent:@"tessdata"];
    if (tessdataPath) 
        [fileManager copyItemAtPath:tessdataPath toPath:dataPath error:NULL];
    


setenv("TESSDATA_PREFIX", [[documentPath stringByAppendingString:@"/"] UTF8String], 1);

return dataPath;

- (NSString*) OCRImage:(UIImage*)src


// init the tesseract engine.
tesseract::TessBaseAPI *tesseract = new tesseract::TessBaseAPI();

tesseract->Init([[self pathToLangugeFIle] cStringUsingEncoding:NSUTF8StringEncoding], "eng");
tesseract->SetVariable("tessedit_char_whitelist", ":-0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");

//Pass the UIIMage to cvmat and pass the sequence of pixel to tesseract

cv::Mat toOCR=[src CVGrayscaleMat];

NSLog(@"%d", toOCR.channels());

tesseract->SetImage((uchar*)toOCR.data, toOCR.size().width, toOCR.size().height
                    , toOCR.channels(), toOCR.step1());

tesseract->Recognize(NULL);

char* utf8Text = tesseract->GetUTF8Text();

return [NSString stringWithUTF8String:utf8Text];


【问题讨论】:

【参考方案1】:
// Copy the training data into the documents directory
    NSArray* trainingDataSuffix = [NSArray arrayWithObjects:@"DangAmbigs",@"freq-dawg",@"inttemp",@"normproto",@"pffmtable",@"traineddata",@"unicharset",@"user-words",@"word-dawg",nil];

    // Get the path to the resource files
    NSString* bundlePath = [[NSBundle mainBundle] bundlePath];
    // Hold a potential error
    NSError* error = nil;
    // Get the contents of the resource directory
    NSArray* dirListing = [[NSFileManager defaultManager] contentsOfDirectoryAtPath:bundlePath error:&error];
    // Boolean to determine whether we have already created a directory or not
    BOOL createdDirectory = NO;
    // The path to the documents directory when appended with the tessdata folder
    NSString* documentsDirectory = [[App getHiddenDocumentPath:@""] stringByAppendingPathComponent:@"tessdata"];
    // Loop the resource files
    for(NSString* file in dirListing)
    
        // Loop the possible extensions we are looking for
        for(NSString* extension in trainingDataSuffix)
        
            // Check if the extension is one of these extensions we have been looking for
            if([[file pathExtension] isEqualToString:extension])
            
                // Check if we have created the directory
                if(!createdDirectory)
                
                    // Create the directory
                    [[NSFileManager defaultManager] createDirectoryAtPath:documentsDirectory withIntermediateDirectories:YES attributes:nil error:&error];
                    // If we have an error tell us what it is
                    if(error != nil)
                    
                        NSLog(@"Error: %@",error);
                        error = nil;
                    
                    // If not, tell the loop we have created the directory so we don't have to do it again
                    else createdDirectory = YES;
                
                // Get the path of the file in the tessdata directory
                NSString* fileInDocumentsDir = [documentsDirectory stringByAppendingPathComponent:[file lastPathComponent]];
                // Check if the file already exists
                if(![[NSFileManager defaultManager] fileExistsAtPath:fileInDocumentsDir])
                
                    // If not, copy the file to the tessdata directory
                    [[NSFileManager defaultManager] copyItemAtPath:[bundlePath stringByAppendingPathComponent:file] toPath:fileInDocumentsDir error:&error];
                    // If we have an error tell us what it is
                    if(error != nil)
                    
                        NSLog(@"Error: %@",error);
                        error = nil;
                    
                
                // We have found a valid extension, it's unlikely we'll find another so break the loop
                break;
            
        
    

    // set the environment variable TESSDATA_PREFIX to the path before the tessdata folder, in this case it's the documents directory
    setenv("TESSDATA_PREFIX",[[App getHiddenDocumentPath:@""] UTF8String],1);

使用此代码将经过训练的数据复制到资源路径。现在可以正常使用了。

来源 - http://b2cloud.com.au/tutorial/tesseract-ocr-and-cross-compiling-on-ios/

【讨论】:

以上是关于Tesseract 找不到经过训练的数据文件的主要内容,如果未能解决你的问题,请参考以下文章

Tesseract训练

用于验证码识别的训练 Tesseract

错误!找不到命令“tesseract”。 (PHP 蒂亚戈莱西奥)

tesseract3.02.02中文训练 使用jTessBoxEditor 打开BOX文件中文乱码

Tesseract 找不到 eng.traineddata

在 IOS (Swift) 中找不到 tesseract 的语言