Unicode issues
thomasf1 opened this issue · 7 comments
I´m having problems with Unicode characters that seem to be messed up.
There are some discussions about it and I´ve found guy doing a PHP FDF that seemed to address the issue...
https://github.com/mikehaertl/php-pdftk/blob/master/src/FdfFile.php
Can you give me an example of some data you are putting in that's coming back weird? It would make it easier to have a use-case where the library is broken.
Also, any idea where in the code that it handles Unicode? At a glance it looks like it's here, but I hardly ever mess with PHP so I'm not 100% sure what I'm looking at.
Reading the comments, it looks like FDF files store their keys and values using UTF-16 (Big Endian), which is probably the problem. Light research indicates that JS uses either, depending on how it's implemented (which may handled differently with Node).
I'm looking around for a way of converting a normal string to UTF-16BE now for JavaScript, see if there's any easy way of fixing this.
Looking at it, it looks like punycode
handles string conversions, plus it's bundled with Node by default.
Helpful Links:
2ality Post on Unicode in JavaScript
punycode Documentation
Thanks for looking into it, I´ve been experimenting with iconv-lite to get it converted to UTF-16BE, but it seems that needs to use not strings but Buffers...
I´ve gotten it to work right with iconv-lite when patching the header manually... Not quite sure how to write out the header in the right way, the current way doesn´t quite work with the Buffers...
Encoding issues are a bi***... I´ve actually given up, looked around and found the xfdf npm handels it much better... :)
Cool find - it might be good to merge this project into xfdf one since they're both tiny and similar in purpose (handling different flavours of adobe data)
Great :)...
One thing I stumbled upon while searching for a solution was http://rhaseventh.blogspot.de/2014/04/node-js-pdf-fill-from-fdf-with-utf-16.html - I couldn´t quite get it to write out the header special characters in the right way though... Might help though :)