koodaamo/tnefparse

tnef_attachement.name is not unicode

guettli opened this issue · 9 comments

I get a UnicodeError in my application code, because tnef_attachement.name is not unicode.

Could the tnef library get updated, to make tnef_attachement.name a unicode string?

In my case it looks like a latin1 string. It would improve the usability of the library if the application programmer does not need to do string decoding.

Sorry, I can't post the tnef binary, since it is from a customer.

BTW, how can you create tnef test binaries?

plq commented

fixed in fbcecac if you call long_filename() AND there is a long filename to return

Unfortunately I don't have the matching test here to proof if the patch solves this issue.

I trust you and I think this issue can be closed.

Should I close it?

plq commented

I'm not in position to make that decision.

petri commented

Please, if you can, provide a PR.

petri commented

Note: master now strips null bytes from long_filename as well.

petri commented

The encoding of tnef attachment (long) names is a bit tricky to get right, it's not exactly simple.

See for example the discussion at roundcube/roundcubemail#5646 .

If you have good understanding of how the encodings in tnef attachments work or any links to information, that would help.

petri commented

Note: we should also consider how this is to work on Python2 vs. Python3 (with help of six?)

This should be addressed by #31

thank you!