This gem is Paperclip processor, utilizing Docsplit in order to convert uploaded files to pdf and extract information/thumbnails. These include the Microsoft Office formats: doc, docx, ppt, xls and so on, as well as html, odf, rtf, swf, svg, and wpd.
(This gem is written and tested on Ruby 1.9 and Rails 3 only).
In order to install it, add to your Gemfile:
gem 'docsplit-paperclip-processor'
And then run:
bundle install
Use it as you would any other Paperclip processor. For example, in your model:
class Document < ActiveRecord::Base
has_attached_file :file,
:styles => {
:pdf => {
:format => "pdf",
:processors => [:docsplit_pdf]
}
}
end
which will convert your document into pdf.
WARNING: This feature is in alpha.
class Document < ActiveRecord::Base
has_attached_file :file,
:styles => {
:text => {
:processors => [:docsplit_text],
:full_text_column => :file_full_text
}
}
end
will extract the text from the file uploaded, and desposit the full text of the file into the column 'file_full_text'.
Will be include in the next releases.
Be warned, this gem is released as early beta version. If you are using it you are doing so on your own responsibility.
Have fun with it and drop me a note if you like it.