从 nodeJS 读取 PDF 文档属性

Posted 2023-02-23

技术标签:

【中文标题】从 nodeJS 读取 PDF 文档属性【英文标题】：Read PDF Document properties from nodeJS 【发布时间】：2019-06-04 04:47:54 【问题描述】：

我正在尝试。我找不到任何用于读取文档属性的节点模块。我可以使用file-metadata 读取文件元数据，但它只提供基本属性。我想阅读文档限制摘要之类的属性（请查看附件图片以供参考。

【问题讨论】：

【参考方案1】：

受@DietrichvonSeggern 的suggestion 启发，我编写了小节点脚本。

const  spawnSync  = require('child_process');

const  stdout  = spawnSync('exiftool',
  ['-b', '-UserAccess', 'test.pdf'],
   encoding: 'ascii' );
const bits = (parseInt(stdout, 10) || 0b111111111110);

const perms = 
  'Print': 1 << 2,
  'Modify': 1 << 3,
  'Copy': 1 << 4,
  'Annotate': 1 << 5,
  'Fill forms': 1 << 8,
  'Extract': 1 << 9,
  'Assemble': 1 << 10,
  'Print high-res': 1 << 11
;

Object.keys(perms).forEach((title) => 
  const bit = perms[title];
  const yesno = (bits & bit) ? 'YES' : 'NO';
  console.log(`$title => $yesno`);
);

它将打印如下内容：

Print => YES
Modify => NO
Copy => NO
Annotate => NO
Fill forms => NO
Extract => NO
Assemble => NO
Print high-res => YES

您应该在系统中安装了exiftool，并将所需的错误检查添加到此脚本中。

ExifTool UserAccess tag reference.

稍作修改：

const perms = 
  'Print': 1 << 2,
  'Modify': 1 << 3,
  'Copy': 1 << 4,
  'Annotate': 1 << 5,
  'FillForms': 1 << 8,
  'Extract': 1 << 9,
  'Assemble': 1 << 10,
  'PrintHighRes': 1 << 11
;

const access = ;
Object.keys(perms).forEach((perm) => 
  const bit = perms[perm];
  access[perm] = !!(bits & bit);
);

console.log(access);

将产生：


  Print: true,
  Modify: false,
  Copy: false,
  Annotate: false,
  FillForms: false,
  Extract: false,
  Assemble: false,
  PrintHighRes: true

【讨论】：

【参考方案2】：

您考虑过使用 exiftool 吗？您必须将它集成到 nodejs 中，但它或多或少地提供了您正在寻找的所有数据。

【讨论】：

这是一个非常好的建议。 exiftool 可以提取包含此信息的UserAccess tag。

以上是关于从 nodeJS 读取 PDF 文档属性的主要内容，如果未能解决你的问题，请参考以下文章