Docsplit images is used to convert a document file (pdf, xls, xlsx, ppt, pptx, doc, docx, etc...) to a list of images combining with famous paperclip gem at https://github.com/thoughtbot/paperclip
Install Docsplit gem dependency (Referring from http://documentcloud.github.com/docsplit/)
1. Install GraphicsMagick. Its ‘gm’ command is used to generate images. Either compile it from source, or use a package manager:
[aptitude | port | brew] install graphicsmagick
aptitude install poppler-utils poppler-data
On Mac, you can install from source or use MacPorts:
sudo port install poppler | brew install poppler
[aptitude | port | brew] install ghostscript
Ghostscript is required to convert PDF and Postscript files.
[aptitude | port | brew] install [tesseract | tesseract-ocr]
Without Tesseract installed, you'll still be able to extract text from documents, but you won't be able to automatically OCR them.
aptitude install pdftk
On the Mac, you can download a [http://www.pdflabs.com/docs/install-pdftk/](recent installer for the binary). Without pdftk installed, you can use Docsplit, but won't be able to split apart a multi-page PDF into single-page PDFs.
aptitude install openoffice.org openoffice.org-java-common
On Mac, download and install http://www.openoffice.org/download/index.html.
gem 'docsplit_images', :git => 'https://github.com/jameshuynh/docsplit_images.git', tag: "v0.2.3"
From terminal, type the command to install
bundle
rails g docsplit_images <table_name> <attachment_field_name>
# e.g. rails generate docsplit_images asset document
rake db:migrate
In your model:
class Asset < ActiveRecord::Base
...
attr_accessible :mydocument
has_attached_file :mydocument
docsplit_images_conversion_for :mydocument, {size: "800x"}
...
end
docsplit_images requires sidekiq to be turned on the process.
[bundle exec] sidekiq
While it is processing using https://github.com/collectiveidea/delayed_job, you can check if it is processing by accessing attribute is_processing_image
asset.is_processing_image?
- If your document file is not PDF, this will be non-zero after the internal conversion to PDF has been completed.
asset.number_of_images_entry
asset.number_of_completed_images
asset.images_conversion_progress
# => 0.45 (which is 45%)
document_images_list
will return a list of URL of images converting from the document
asset.document_images_list
# => ["/system/myfile_revisions/files/000/000/019/images/SBA_Admin_workflow_1.png", "/system/myfile_revisions/files/000/000/019/images/SBA_Admin_workflow_2.png", ...]
- Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
- Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
- Fork the project.
- Start a feature/bugfix branch.
- Commit and push until you are happy with your contribution.
- Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
- Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
Copyright (c) 2012 jameshuynh. See LICENSE.txt for further details.