documentcloud/docsplit

Java chokes on paths with spaces

Closed this issue · 7 comments

When docsplit is installed in a directory which contains spaces, java jokes and dies.

I think that the latest master solves whitespace problems... Give it a try. If that's the case, then we need to do a new release.

diff -Naur /Library/Ruby/Gems/1.8/gems/docsplit-0.6.0/lib/docsplit.rb docsplit-0.6.0-1/lib/docsplit.rb
--- /Library/Ruby/Gems/1.8/gems/docsplit-0.6.0/lib/docsplit.rb 2011-10-11 21:06:26.000000000 +0100
+++ docsplit-0.6.0-1/lib/docsplit.rb 2012-03-16 19:51:10.000000000 +0100
@@ -1,13 +1,22 @@

The Docsplit module delegates to the Java PDF extractors.

+require 'tmpdir'
+require 'fileutils'
+require 'shellwords'
+
module Docsplit

VERSION = '0.6.0' # Keep in sync with gemspec.

ROOT = File.expand_path(File.dirname(FILE) + '/..')

  • ESCAPE = lambda {|x| Shellwords.shellescape(x) }
  • CLASSPATH = "#{ROOT}/build#{File::PATH_SEPARATOR}#{ROOT}/vendor/'*'"
  • ROOT_E = ROOT.map(&ESCAPE)
  • CLASSPATH = "#{ROOT_E}/build#{File::PATH_SEPARATOR}#{ROOT_E}/vendor/'*'"
  • LOGGING = "-Djava.util.logging.config.file=#{ROOT}/vendor/logging.properties"
  • LOGGING = "-Djava.util.logging.config.file=#{ROOT_E}/vendor/logging.properties"

HEADLESS = "-Djava.awt.headless=true"

@@ -19,8 +28,6 @@

DEPENDENCIES = {:java => false, :gm => false, :pdftotext => false, :pdftk => false, :tesseract => false}

- ESCAPE = lambda {|x| Shellwords.shellescape(x) }

Check for all dependencies, and note their absence.

dirs = ENV['PATH'].split(File::PATH_SEPARATOR)
DEPENDENCIES.each_key do |dep|
@@ -68,7 +75,7 @@
if ext.length > 0 && GM_FORMATS.include?(ext.sub(/^./, '').downcase.to_sym)
gm convert #{escaped_doc} #{escaped_out}/#{escaped_basename}.pdf
else

  •    options = "-jar #{ROOT}/vendor/jodconverter/jodconverter-core-3.0-beta-3.jar -r #{ROOT}/vendor/conf/document-formats.js"
    
  •    options = "-jar #{ROOT_E}/vendor/jodconverter/jodconverter-core-3.0-beta-3.jar -r #{ROOT_E}/vendor/conf/document-formats.js"
     run "#{options} #{escaped_doc} #{escaped_out}/#{escaped_basename}.pdf", [], {}
    

    end
    end
    @@ -114,9 +121,6 @@

    end

-require 'tmpdir'
-require 'fileutils'
-require 'shellwords'
require "#{Docsplit::ROOT}/lib/docsplit/image_extractor"
require "#{Docsplit::ROOT}/lib/docsplit/transparent_pdfs"
require "#{Docsplit::ROOT}/lib/docsplit/text_extractor"

This fixes the path issue if docsplit itself is installed in a directory containing a space... Sorry, no git here, only diff.

@ineiti could you please make a Pull Request?

Le 23/10/12 10:43, Natim a écrit :

@ineiti https://github.com/ineiti could you please make a Pull Request?

??? There is

#49

for a while now...

Linus

Thanks guys, we're working on a release.

fixed in 491cddf