递归目录树并使用pdf阅读器计算所有pdf文件中的页面

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了递归目录树并使用pdf阅读器计算所有pdf文件中的页面相关的知识,希望对你有一定的参考价值。

I had a directory tree with around 4000 pdf files and I needed a page count - so I semi-rolled this. I swiped the counter code from the gem README:
http://github.com/yob/pdf-reader/tree/master

It could be more contained - as is I run it from irb:

`>> require 'total_pages'`
`>> pagetotal = TotalPages.new`
`>> pagetotal.count('/my/pdf/directory')`

I added rescue to print info on a file if it fails to open or doesn't conform to to the PDF specification and causes pdf-reader to raise an error - without this the script will quit - that sucks when you're trying to count pages in thousands of files.
  1. require 'rubygems'
  2. require 'pdf/reader'
  3.  
  4. class TotalPages
  5.  
  6. def count(dir)
  7. @conv_directory = dir
  8. ## I output the directory argument as a test with the below line -
  9. ## mostly to make sure that passing '.' gets current dir
  10. # puts @conv_directory
  11. recurse_and_count
  12. end
  13.  
  14. def directory
  15. @conv_directory
  16. end
  17.  
  18. def directory_tree
  19. Dir["#{directory}/**/*"]
  20. end
  21.  
  22. def recurse_and_count
  23. total = 0
  24. directory_tree.each do |item|
  25. case File.stat(item).ftype
  26. when 'file'
  27. if File.extname(item).downcase == ".pdf"
  28. receiver = PageReceiver.new
  29. pdf = PDF::Reader.file(item, receiver, :pages => false)
  30. total += receiver.pages
  31. end rescue p item
  32. end
  33. end
  34. total
  35. end
  36.  
  37. end
  38.  
  39. # receiver = PageReceiver.new
  40. # pdf = PDF::Reader.file("somefile.pdf", receiver, :pages => false)
  41. class PageReceiver
  42. attr_accessor :pages
  43.  
  44. # Called when page parsing ends
  45. def page_count(arg)
  46. @pages = arg
  47. end
  48. end

以上是关于递归目录树并使用pdf阅读器计算所有pdf文件中的页面的主要内容,如果未能解决你的问题,请参考以下文章

递归(许多子目录)查找pdf文件并合并为一个pdf文件(linux,bash)

如何按文件类型递归查找文件并将它们复制到目录?

在 find 和 ls 中使用通配符 [重复]

python 以递归方式提取文件夹中所有.pdf文件中的注释和突出显示的段落,并将它们作为具有相同名称的文本文件输出

linux下怎么阅读PDF格式文件?

如何从我的 Android 应用程序中的所有目录中获取所有 pdf 文件