如何在Android Studio中将Pdf文件转换为文本

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在Android Studio中将Pdf文件转换为文本相关的知识,希望对你有一定的参考价值。

我想从android中的文件管理器中选择一个pdf文件,并将其转换为文本,以便文本到语音可以读取它。我正在从android开发者网站关注此文档;但是,此示例用于打开文本文件。我正在使用PdfReader类/库来打开文件并转换为文本。但我不知道如何将其与Uri集成。这是我需要使用PdfReader从pdf转换为文本的代码

PdfReader pdfReader = new PdfReader(file.getPath());
stringParser = PdfTextExtractor.getTextFromPage(pdfReader, 1).trim();
pdfReader.close();

我正在使用意图呼叫文件管理器,以便用户可以选择pdf文件

fab.setOnClickListener(new View.OnClickListener() {
@Override
   public void onClick(View view) {
      intent = new Intent(Intent.ACTION_OPEN_DOCUMENT);
      intent.setType("*/*");
      startActivityForResult(intent, READ_REQUEST_CODE);
   }
});

然后我要获取uri并打开文件

@Override
    protected void onActivityResult(int requestCode, int resultCode, Intent resultData) {
        if (requestCode == READ_REQUEST_CODE && resultCode == Activity.RESULT_OK) {
            if(resultData != null) {
                Uri uri = resultData.getData();
                Toast.makeText(MainActivity.this, filePath , Toast.LENGTH_LONG).show();
                readPdfFile(uri);
            }
        }
    }

    private String readTextFromUri(Uri uri) throws IOException {
        StringBuilder stringBuilder = new StringBuilder();
        try (InputStream inputStream =
                     getContentResolver().openInputStream(uri);
             BufferedReader reader = new BufferedReader(
                     new InputStreamReader(Objects.requireNonNull(inputStream)))) {
            String line;
            while ((line = reader.readLine()) != null) {
                stringBuilder.append(line);
            }
        }
        return stringBuilder.toString();
    }
答案
public class SyncPdfTextExtractor {
  // TODO: When you have your own Premium account credentials, put them down here:
  private static final String CLIENT_ID = "FREE_TRIAL_ACCOUNT";
  private static final String CLIENT_SECRET = "PUBLIC_SECRET";
  private static final String ENDPOINT = "https://api.whatsmate.net/v1/pdf/extract?url=";

  /**
   * Entry Point
   */
  public static void main(String[] args) throws Exception {
    // TODO: Specify the URL of your small PDF document (less than 1MB and 10 pages)
    // To extract text from bigger PDf document, you need to use the async method.
    String url = "https://www.harvesthousepublishers.com/data/files/excerpts/9780736948487_exc.pdf";
    SyncPdfTextExtractor.extractText(url);
  }

  /**
   * Extracts the text from an online PDF document.
   */
  public static void extractText(String pdfUrl) throws Exception {
    URL url = new URL(ENDPOINT + pdfUrl);
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    conn.setDoOutput(true);
    conn.setRequestMethod("GET");
    conn.setRequestProperty("X-WM-CLIENT-ID", CLIENT_ID);
    conn.setRequestProperty("X-WM-CLIENT-SECRET", CLIENT_SECRET);

    int statusCode = conn.getResponseCode();
    System.out.println("Status Code: " + statusCode);
    InputStream is = null;
    if (statusCode == 200) {
        is = conn.getInputStream();
        System.out.println("PDF text is shown below");
        System.out.println("=======================");
    } else {
        is = conn.getErrorStream();
        System.err.println("Something is wrong:");
    }

    BufferedReader br = new BufferedReader(new InputStreamReader(is)); 
    String output;
    while ((output = br.readLine()) != null) {
        System.out.println(output);
    }
    conn.disconnect();
  }

}
------------------------------------

Copying above code follow below Steps-

Specify the URL of your online PDF document on line 20.
Replace the Client ID and Secret on lines 10 and 11 if you have your own credentials.
另一答案

使用此摇篮:-

implementation 'com.itextpdf:itextg:5.5.10'
try {
      String parsedText="";
      PdfReader reader = new PdfReader(yourPdfPath);
      int n = reader.getNumberOfPages();
      for (int i = 0; i <n ; i++) {
           parsedText   = parsedText+PdfTextExtractor.getTextFromPage(reader, i+1).trim()+"
"; //Extracting the content from the different pages
      }
      System.out.println(parsedText);
      reader.close();
   } catch (Exception e) {
      System.out.println(e);
}

以上是关于如何在Android Studio中将Pdf文件转换为文本的主要内容,如果未能解决你的问题,请参考以下文章

如何在 Android Studio 中将主题从 Darcula 恢复为默认值

如何在 Android Studio 中将 CSV 文件解析为数组

如何在 android studio 中将 .aar 文件添加到 gradle.kts?

如何在 WebView Android Studio 中打开本地 pdf 文件

如何在kotlin android中将pdf文件编码为base64字符串

如何在 Android Studio 中将文本设置为对应的图像