PDF读取内容流时出错
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了PDF读取内容流时出错相关的知识,希望对你有一定的参考价值。
我正在努力捕获对show
的postscript调用,并将currentfont和font size存储到pdf Text对象的输出中。
PDF file Input Postscript Program
但是identify
给了我一个错误:
$ identify pd0.pdf
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** This file had errors that were repaired or ignored.
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
pd0.pdf[0] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
pd0.pdf[1] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
pd0.pdf[2] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
而ghostscript的输出并没有给我理解问题所需的细节:
$ gsnd -dPDFDEBUG pd0.pdf
GPL Ghostscript 9.18 (2015-10-05)
Copyright (C) 2015 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
<<
/Root 1 0 R
/Size 12 >>
%Resolving: [1 0]
<<
/Type /Catalog /Pages 2 0 R
>>
endobj
%Resolving: [2 0]
<<
/Kids [
3 0 R
6 0 R
9 0 R
]
/Type /Pages /Count 3 >>
endobj
%Resolving: [3 0]
<<
/Parent 2 0 R
/Contents [
5 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F1 4 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [6 0]
<<
/Parent 2 0 R
/Contents [
8 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F2 7 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [9 0]
<<
/Parent 2 0 R
/Contents [
11 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F3 10 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [2 0]
Processing pages 1 through 3.
Page 1
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [5 0]
<<
/Length 15660 >>
stream
%FilePosition: 471
endobj
BT
F1
10.0 Tf
%Resolving: [4 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
Page 2
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [8 0]
<<
/Length 31667 >>
stream
%FilePosition: 16474
endobj
BT
F2
10.0 Tf
%Resolving: [7 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
Page 3
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [6 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [11 0]
<<
/Length 8335 >>
stream
%FilePosition: 48487
endobj
BT
F3
10.0 Tf
%Resolving: [10 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** This file had errors that were repaired or ignored.
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
GS>
任何人都可以帮我理解我输出的pdf文件的问题是什么?
PDF中存在许多错误。根据所讨论的PDF查看器,需要修复其中较小或较大的子集,以允许按预期显示PDF。
page content streams
页面内容流的内容如下所示:
BT F1 10.0 Tf 30.0 750.0 Td (<< ) Tj ET BT F1 10.0 Tf 50.0 738.0 Td (/) Tj ET [...]
这里的错误在字体选择说明中:
F1 10.0 Tf
字体名称操作数F1不是作为PDF名称对象给出的(可以通过前导斜杠识别),而是作为通常为指令运算符保留的通用文字。
(另外,这些内容流结构不必要地膨胀,大多数单个文本对象仅绘制一到三个字形并且具有它们自己的(总是相同的)文本字体选择指令。本身不是错误但完全没有必要)
此外,正如@ usr2564301所示,流长度似乎偏离了1。
font resources
每个字体资源如下所示:
<<
/Type /Font
/SubType /Type1
/BaseFont /Palatino-Roman
>>
首先,存在的问题是:正如@KenS已经指出的那样,正确的拼写是Subtype,而不是SubType。
还有另外一个问题:那么短的字体资源字典到PDF 1.7只允许标准的14种字体,而PDF 2.0则不再允许。由于Palatino-Roman显然不是标准的14字体,因此无论如何资源都是不完整的。
根据表109 - ISO 32000-2中类型1字体字典中的条目,
- Type,Subtype和BaseFont是普遍要求的,
- FirstChar,LastChar,Widths和FontDescriptor是必需的,但在PDF 1.0-1.7标准14字体的可选项中,
- 名称在PDF 1.0中是必需的,在PDF 1.1到1.7中是可选的,在PDF 2.0中不推荐使用,以及
- 编码和ToUnicode是通用的可选项。
根据PDF查看器,您尝试的要求可能看起来更宽松,但如果您不符合规范要求,任何PDF处理器都可能无理由拒绝您的PDF。
cross references
@ usr2564301还提到许多交叉引用表条目(以及对交叉引用表本身的开头的引用)都是1。
它们确实没有指向对象编号/ xref文字,而是指向之前的空白区域。由于在数字/文字之前只需要忽略空格,因此很多PDF处理器都不会注意到。
以上是关于PDF读取内容流时出错的主要内容,如果未能解决你的问题,请参考以下文章