如何在Android Studio中将Pdf文件转换为文本
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在Android Studio中将Pdf文件转换为文本相关的知识,希望对你有一定的参考价值。
我想从android中的文件管理器中选择一个pdf文件,并将其转换为文本,以便文本到语音可以读取它。我正在从android开发者网站关注此文档;但是,此示例用于打开文本文件。我正在使用PdfReader类/库来打开文件并转换为文本。但我不知道如何将其与Uri集成。这是我需要使用PdfReader从pdf转换为文本的代码
PdfReader pdfReader = new PdfReader(file.getPath());
stringParser = PdfTextExtractor.getTextFromPage(pdfReader, 1).trim();
pdfReader.close();
我正在使用意图呼叫文件管理器,以便用户可以选择pdf文件
fab.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View view) {
intent = new Intent(Intent.ACTION_OPEN_DOCUMENT);
intent.setType("*/*");
startActivityForResult(intent, READ_REQUEST_CODE);
}
});
然后我要获取uri并打开文件
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent resultData) {
if (requestCode == READ_REQUEST_CODE && resultCode == Activity.RESULT_OK) {
if(resultData != null) {
Uri uri = resultData.getData();
Toast.makeText(MainActivity.this, filePath , Toast.LENGTH_LONG).show();
readPdfFile(uri);
}
}
}
private String readTextFromUri(Uri uri) throws IOException {
StringBuilder stringBuilder = new StringBuilder();
try (InputStream inputStream =
getContentResolver().openInputStream(uri);
BufferedReader reader = new BufferedReader(
new InputStreamReader(Objects.requireNonNull(inputStream)))) {
String line;
while ((line = reader.readLine()) != null) {
stringBuilder.append(line);
}
}
return stringBuilder.toString();
}
答案
public class SyncPdfTextExtractor {
// TODO: When you have your own Premium account credentials, put them down here:
private static final String CLIENT_ID = "FREE_TRIAL_ACCOUNT";
private static final String CLIENT_SECRET = "PUBLIC_SECRET";
private static final String ENDPOINT = "https://api.whatsmate.net/v1/pdf/extract?url=";
/**
* Entry Point
*/
public static void main(String[] args) throws Exception {
// TODO: Specify the URL of your small PDF document (less than 1MB and 10 pages)
// To extract text from bigger PDf document, you need to use the async method.
String url = "https://www.harvesthousepublishers.com/data/files/excerpts/9780736948487_exc.pdf";
SyncPdfTextExtractor.extractText(url);
}
/**
* Extracts the text from an online PDF document.
*/
public static void extractText(String pdfUrl) throws Exception {
URL url = new URL(ENDPOINT + pdfUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setDoOutput(true);
conn.setRequestMethod("GET");
conn.setRequestProperty("X-WM-CLIENT-ID", CLIENT_ID);
conn.setRequestProperty("X-WM-CLIENT-SECRET", CLIENT_SECRET);
int statusCode = conn.getResponseCode();
System.out.println("Status Code: " + statusCode);
InputStream is = null;
if (statusCode == 200) {
is = conn.getInputStream();
System.out.println("PDF text is shown below");
System.out.println("=======================");
} else {
is = conn.getErrorStream();
System.err.println("Something is wrong:");
}
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String output;
while ((output = br.readLine()) != null) {
System.out.println(output);
}
conn.disconnect();
}
}
------------------------------------
Copying above code follow below Steps-
Specify the URL of your online PDF document on line 20.
Replace the Client ID and Secret on lines 10 and 11 if you have your own credentials.
另一答案
使用此摇篮:-
implementation 'com.itextpdf:itextg:5.5.10'
try {
String parsedText="";
PdfReader reader = new PdfReader(yourPdfPath);
int n = reader.getNumberOfPages();
for (int i = 0; i <n ; i++) {
parsedText = parsedText+PdfTextExtractor.getTextFromPage(reader, i+1).trim()+"
"; //Extracting the content from the different pages
}
System.out.println(parsedText);
reader.close();
} catch (Exception e) {
System.out.println(e);
}
以上是关于如何在Android Studio中将Pdf文件转换为文本的主要内容,如果未能解决你的问题,请参考以下文章
如何在 Android Studio 中将主题从 Darcula 恢复为默认值
如何在 Android Studio 中将 CSV 文件解析为数组
如何在 android studio 中将 .aar 文件添加到 gradle.kts?
如何在 WebView Android Studio 中打开本地 pdf 文件