Wrong encoding for Keywords and Sublocation when write Cyrillic
anstaks opened this issue · 13 comments
My keywords and sublocation looks like "??????", but Title and Creator works fine! :exif_encoding and :iptc_encoding did not help me.
http://monosnap.com/image/QoPmIlB6xYgApjd7mWtn4soNX
How I can fix it?
Could you explain what you are doing?
Hello! I have a model Photo, and when I update attribute "keywords":
class Photo < ActiveRecord::Base
before_update :exif_write
....
....
....
def exif_write
exif_attrs_changed = [:title, :description, :tags, :author, :photo_place, :title, :date, :rating].any? {|attr| self.send("#{attr}_changed?")}
if exif_attrs_changed
photo = MiniExiftool.new(self.file.file.file)
photo.DateTimeOriginal = self.date.to_time
photo.Title = self.title
photo.ImageDescription = self.description
photo.Keywords = self.tags.split(',').collect(&:strip).reject(&:empty?).uniq
photo.Creator = self.author
photo.Rating = self.rating
photo.save
end
end
end
So, if I put keyword in english - everything is great, but in Cyrillic , I have "??????"
When I updating Title, ImageDescription, Creator in Cyrillic - everything great, I have that problem only for Keywords.
http://monosnap.com/image/QeSNm6kdA0dhfPqZAzxgjq7PQ
https://www.monosnap.com/image/QoPmIlB6xYgApjd7mWtn4soNX - another picture in Photoshop Lightroom
In which encoding are your tags?
Could you please try it in a simple script and run it with the Ruby -d parameter? So mini_exiftool reports the executed command line to $stderr and we can maybe see what's wrong.
Could you further send me an example photo with the faulty tags?
Could you solve your problem? If not so, could I help somehow?
Hello! Yes I did. I add "-CodedCharacterSet=UTF8" to params in mini_exiftool.rb!
def save
MiniExiftool.setup
return false if @changed_values.empty?
@errors.clear
temp_file = Tempfile.new('mini_exiftool')
temp_file.close
temp_filename = temp_file.path
FileUtils.cp filename.encode(@@fs_enc), temp_filename
all_ok = true
@changed_values.each do |tag, val|
original_tag = MiniExiftool.original_tag(tag)
arr_val = val.kind_of?(Array) ? val : [val]
arr_val.map! {|e| convert_before_save(e)}
params = '-q -P -overwrite_original -CodedCharacterSet=UTF8 '
params << (arr_val.detect {|x| x.kind_of?(Numeric)} ? '-n ' : '')
params << (@opts[:ignore_minor_errors] ? '-m ' : '')
arr_val.each do |v|
params << %Q(-#{original_tag}=#{escape(v)} )
end
result = run(cmd_gen(params, temp_filename))
unless result
all_ok = false
@errors[tag] = @error_text.gsub(/Nothing to do.\n\z/, '').chomp
end
end
if all_ok
FileUtils.cp temp_filename, filename.encode(@@fs_enc)
reload
end
temp_file.delete
all_ok
end
If you need another IPTC encoding you can use the :iptc_encoding
option:
photo = MiniExiftool.new(self.file.file.file, :iptc_encoding => 'UTF-8')
I found the same problem and tried to use the option: iptc_encoding => 'UTF-8'. Unfortunately it did not work, the IPTC encoding problem continued. Analyzing the code, that uses the -charset option, but apparently this option not alter the internal encoding of the IPTC. Using the approach of @anstaks, with the option -CodedCharacterSet, worked properly.
Oh, I'm sorry. I didn't read the explanations of @anstaks carefully enough. The encoding stuff is a very complex topic.
If I understand it now right "CodedCharacterSet" is simple an IPTC tag. So you can use it as any other tag:
photo = MiniExiftool.new('some_file.jpg')
photo.coded_character_set = 'utf8' # set CodedCharacterSet explicit
photo.caption_abstract = 'Text with Ümläüts'
photo.save
Please let me know if this works or not.
Works like a charm!
Reading the ExifTool documentation (http://www.sno.phy.queensu.ca/~phil/exiftool/faq.html#Q10) really realized that the default encoding IPTC is Windows Latin1. It would not be appropriate to make that more explicit in the documentation or perhaps even set the tag CodedCharacterSet UTF8 as default?
God created the world and the devil created the encoding! =)
Thank you so much!
Em 20/05/2014, à(s) 16:27, janfri notifications@github.com escreveu:
Oh, I'm sorry. I didn't read the explanations of @anstaks carefully enough. The encoding stuff is a very complex topic.
If I understand it now right "CodedCharacterSet" is simple an IPTC tag. So you can use it as any other tag:photo = MiniExiftool.new('some_file.jpg')
photo.coded_character_set = 'utf8' # set CodedCharacterSet explicit
photo.caption_abstract = 'Text with Ümläüts'
photo.save
Please let me know if this work or not.—
Reply to this email directly or view it on GitHub.
I'm thinking about to make a hint in the mini_exiftool documentation. Not yet sure how.
Setting CodedCharacterSet to UTF8 as default is no option for me because this would differ from the behavior of exiftool itself.
I agree! I thought that the library can not actually modify the behavior of ExifTool.
But the documentation is very important because I lost many hours to find the problem and it is very difficult to understand.
When you use the library to debug, the problem does not occur because it converts internally to UTF-8. Only when analyzing the binary contents of the IPTC is that you realize that it is stored incorrectly, so tools like Photoshop and Mac Preview data appear incorrectly.
Em 20/05/2014, à(s) 18:03, janfri notifications@github.com escreveu:
I'm thinking about to make a hint in the mini_exiftool documentation. Not yet sure how.
Setting CodedCharacterSet to UTF8 as default is no option for me because this would differ from the behavior of exiftool itself.
—
Reply to this email directly or view it on GitHub.
After reading the exiftool documentation again I think that the -charset iptc=utf8
alias iptc_encoding: 'utf8'
should handle the writing of the values itself with correct encoding. (Unless the documentation recommends to set CodedCharacterSet explicit.)
Now I looked again in my tests and code and found that the *_encoding
options are ignored when writing via MiniExiftool#save
. So this is really a bug in mini_exiftool which I'll fixing soon.
So the bug with `ìptc_encoding`` is now fixed and the following should now work in the new released version 2.4.2:
photo = MiniExiftool.new(filename, :iptc_encoding => 'UTF-8')
photo.caption_abstract = 'Text with Ümläüts'
photo.save
Nevertheless the tag CodedCharacterSet should in addition to this set as described in the ExifTool documentation:
Note that unless CodedCharacterSet is UTF‑8, applications have no reliable way to determine the IPTC character encoding. For this reason, it is recommended that CodedCharacterSet be set to "UTF8" when creating new IPTC.
But be aware of having already IPTC tags with other encoding encoded:
ExifTool may be used to convert IPTC values to a different internal encoding. To do this, all IPTC tags must be rewritten along with the desired value of CodedCharacterSet. For example, the following command changes the internal IPTC encoding to UTF‑8 (from Windows Latin1 unless CodedCharacterSet was already "UTF8"):
exiftool -tagsfromfile @ -iptc:all -codedcharacterset=utf8 a.jpg
I hope that the confusion is now resolved.
I have added a section about the encodings in the README file.