janfri/mini_exiftool

Weird error with unicode characters in path "Wildcards don't work in the directory specification"

ccoenen opened this issue · 18 comments

This may not be a bug in mini_exiftool, but right now, i don't really know what to make of it. If you could help me pin it down, that would be amazing.

I'm on windows, and i have path names that contain unicode characters. One path looks like this:

C:\tmp\2015-03-23 Test with german umlaut äöü\IMG_1000.JPG

Now if i fire up an irb, i can open that file with ruby, but not with mini_magick

f = File.open("C:/tmp/2015-03-23 Test with german umlaut äöü/IMG_1000.JPG")
# => #<File:C:/tmp/2015-03-23 Test with german umlaut äöü/IMG_1000.jpeg>
f.size
# => 18713

# so far, so good! Let's try mini_exiftool, now.

require 'mini_exiftool'
# => true
m = MiniExiftool.new("C:/tmp/2015-03-23 Test with german umlaut äöü/IMG_1000.JPG")
# MiniExiftool::Error: Wildcards don't work in the directory specification
# No matching files

#        from C:/Tools/Ruby21/lib/ruby/gems/2.1.0/gems/mini_exiftool-2.5.0/lib/mini_exiftool.rb:137:in `load'
#        from C:/Tools/Ruby21/lib/ruby/gems/2.1.0/gems/mini_exiftool-2.5.0/lib/mini_exiftool.rb:101:in `initialize'
#        from (irb):11:in `new'
#        from (irb):11
#        from C:/Tools/Ruby21/bin/irb:11:in `<main>'

note that this is not a "file not found", because i can easily provoke that:

m = MiniExiftool.new("C:/tmp/2015-03-23 Test with german umlaut äöü/IMG_1337.JPG")
# MiniExiftool::Error: File 'C:/tmp/2015-03-23 Test with german umlaut äöü/IMG_1337.JPG' does not exist.
#        from C:/Tools/Ruby21/lib/ruby/gems/2.1.0/gems/mini_exiftool-2.5.0/lib/mini_exiftool.rb:121:in `load'
#        ...

the same example works fine, if i change the directory name to omit the äöü part:

m = MiniExiftool.new("C:/tmp/2015-03-23 Test without german umlaut/IMG_1000.JPG")
# => #<MiniExiftool:0x3031f50 @opts={:numerical=>false, :composite=>true, ...

The error message (Wildcards don't work in the directory specification) does not come from anywhere within mini_exiftool, at least not that i can find it with github's code search.

Exiftool itself is also not at fault (at least not alone), because i can do this without a problem:

> exiftool.exe "C:\tmp\2015-03-23 Test with german umlaut äöü\IMG_1000.JPG"
ExifTool Version Number         : 9.90
File Name                       : IMG_1000.JPG
...

I'm really somewhat stuck.

This does not change if i'm using backslashes instead of forward slashes.

I'm doing a lot to handle encoding and escaping particularly for filenames in mini_exiftool. What is the result of

Encoding.find('filesystem')

on your windows system?

Encoding.find('filesystem')
# =><Encoding:Windows-1252>

This is a Windows 7 (x64) machine with this environment:

C:\Users\user>bundler env
Bundler 1.7.12
Ruby 2.1.5 (2014-11-13 patchlevel 273) [i386-mingw32]
Rubygems 2.4.6

This seems to be correct. I have no idea. Maybe a look at the executed command line will be helpful:

$DEBUG = true
m = MiniExiftool.new("C:/tmp/2015-03-23 Test with german umlaut äöü/IMG_1337.JPG")

exiftool -j "C:/tmp/2009-03-07 test Path ???/IMG_4224.JPG" - it seems to replace the umlauts with question marks, which are a wildcard on windows (single character).

In which encoding is your source file written? Do you use the correct magic comment? http://en.wikibooks.org/wiki/Ruby_Programming/Encoding#Using_Encodings

The examples earlier were from irb, with no encoding set, explicitly.

Here's all of the encoding outputs for reference

Encoding.find('external')
# <Encoding:CP850>
Encoding.find('internal')
# nil
Encoding.find('filesystem')
#<Encoding:Windows-1252>
Encoding.find('locale')
#<Encoding:CP850>

From within the irb i ran the following commands:

Encoding.default_external = 'utf-8'
# "utf-8"
Encoding.default_internal = 'utf-8'
# "utf-8"
require 'mini_exiftool'
# true
m = MiniExiftool.new("C:/tmp/2009-03-07 test Path äöü/IMG_1000.JPG")
# MiniExiftool::Error: Wildcards don't work in the directory specification
# No matching files
# ...

I also put this into a ruby file (and i double checked that it was actually saved as UTF-8)

#encoding: UTF-8
Encoding.default_external = 'utf-8'
Encoding.default_internal = 'utf-8'
require 'mini_exiftool'
m = MiniExiftool.new("C:/tmp/2009-03-07 test Path äöü/IMG_4224.JPG")
puts m

It fails with the same Wildcards-Error-Message.

Could you try (UTF-8 encoded)?

#encoding: UTF-8
puts `exiftool.exe "C:/tmp/2009-03-07 test Path äöü/IMG_1000.JPG"`

it can't find the file, but what i find more interesting is, that the encoding is wonky, so maybe it's already broken before it hits exiftool? I ran these lines:

# encoding: UTF-8
# äöü

require 'open3'
Encoding.default_external = 'UTF-8'

paths = [
  "\"C:/tmp/test äöü/201412050001hq.jpg\"",
  "\"C:\\tmp\\test äöü\\201412050001hq.jpg\""
]

paths.each do |path|
  puts "## Run with path: #{path}"

  puts "*backticks*\n"
  out = `exiftool.exe #{path} 2>&1`
  puts '    ' + out
  puts '    ' + out.force_encoding(Encoding.find('filesystem')).encode('UTF-8')

  puts "*popen3*\n"
  stdin, stdout, _ = Open3.popen3("exiftool.exe #{path} 2>&1")
  stdin.close
  out = stdout.read
  puts '    ' + out
  puts '    ' + out.force_encoding(Encoding.find('filesystem')).encode('UTF-8')
end

which produces this output:

## Run with path: "C:/tmp/test äöü/201412050001hq.jpg"
*backticks*
    File not found: C:/tmp/test 巼/201412050001hq.jpg
    File not found: C:/tmp/test äöü/201412050001hq.jpg
*popen3*
    File not found: C:/tmp/test 巼/201412050001hq.jpg
    File not found: C:/tmp/test äöü/201412050001hq.jpg
## Run with path: "C:\tmp\test äöü\201412050001hq.jpg"
*backticks*
    File not found: C:/tmp/test 巼/201412050001hq.jpg
    File not found: C:/tmp/test äöü/201412050001hq.jpg
*popen3*
    File not found: C:/tmp/test 巼/201412050001hq.jpg
    File not found: C:/tmp/test äöü/201412050001hq.jpg

The broken characters may not end up correctly in here, so i also made a screenshot from Notepad++, where broken characters are displayed as hex:

output

(just to make sure: i ran the same test on Ruby 2.2.1x64 on windows just now. Same output)

This might be interesting: http://www.sno.phy.queensu.ca/~phil/exiftool/exiftool_pod.html#windows_unicode_file_names this has been introduced/changed on 2015-01-04. My tests have been with 9.90, so this might explain some of the encoding weirdness.

I tried specifying the -charset FileName=cp1252 (and UTF8, while i was at it), it didn't change the file not found. As long as that does not work in any way, i don't think mini_exiftool is to blame. If i can't get to the file from a simple backtick or popen3, i don't think mini_exiftool can.

How should i continue? Do we close this ticket unresolved (upstream problem somewhere)? Do we leave it open?

I don't get it?! I can use umlaut files with multi_exiftool?! What the actual f*ck?! Sorry. I'm going to post an example over there in the next few hours.

Here's the change i did, that fixes umlauts and lets all of multi_exiftools tests pass. ccoenen/multi_exiftool@4b836a6 For some reason, though, i can't get it to work in backticks or popen3 (as described above).

I'm here few years late but I had the same issue, my fix was install ruby and exiftool over Linux and it worked perfectly.

Hope this comment will be helpful.

@ManuelSamudio12 The intention was to get it working under Windows. ;-)

Same here, trying to create context menu with batch file
REG ADD "HKCR\*\shell\ExifTool\command" /t REG_SZ /d "\"%systemroot%\system32\cmd.exe\" /K exiftool \"%%L\"" /f
works just fine if the file is in standard named folder, but not work if the file is in folders with special charactersm in my case the special character is Δ

but when i cd to the directory contains special character, and then run open command prompt in that folder / cmd > then type exiftool <file> , it works just fine (exiftool shows information about the file)