递归目录树并使用pdf阅读器计算所有pdf文件中的页面
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了递归目录树并使用pdf阅读器计算所有pdf文件中的页面相关的知识,希望对你有一定的参考价值。
I had a directory tree with around 4000 pdf files and I needed a page count - so I semi-rolled this. I swiped the counter code from the gem README:http://github.com/yob/pdf-reader/tree/master
It could be more contained - as is I run it from irb:
`>> require 'total_pages'`
`>> pagetotal = TotalPages.new`
`>> pagetotal.count('/my/pdf/directory')`
I added rescue to print info on a file if it fails to open or doesn't conform to to the PDF specification and causes pdf-reader to raise an error - without this the script will quit - that sucks when you're trying to count pages in thousands of files.
require 'rubygems' require 'pdf/reader' class TotalPages def count(dir) @conv_directory = dir ## I output the directory argument as a test with the below line - ## mostly to make sure that passing '.' gets current dir # puts @conv_directory recurse_and_count end def directory @conv_directory end def directory_tree Dir["#{directory}/**/*"] end def recurse_and_count total = 0 directory_tree.each do |item| case File.stat(item).ftype when 'file' if File.extname(item).downcase == ".pdf" receiver = PageReceiver.new pdf = PDF::Reader.file(item, receiver, :pages => false) total += receiver.pages end rescue p item end end total end end # receiver = PageReceiver.new # pdf = PDF::Reader.file("somefile.pdf", receiver, :pages => false) class PageReceiver attr_accessor :pages # Called when page parsing ends def page_count(arg) @pages = arg end end
以上是关于递归目录树并使用pdf阅读器计算所有pdf文件中的页面的主要内容,如果未能解决你的问题,请参考以下文章
递归(许多子目录)查找pdf文件并合并为一个pdf文件(linux,bash)