bbottema/outlook-message-parser

Encoding issues with bodyHTML

Closed this issue · 1 comments

I have an issue that result in a problem similar to #34. When using the following code the string has a messed up encoding.

try (FileInputStream fileInputStream = new FileInputStream(msgFileName)) {
	OutlookMessageParser outlookMessageParser = new OutlookMessageParser();
	OutlookMessage outlookMessage = outlookMessageParser.parseMsg(msgFileName);
			
	System.out.println(outlookMessage.getBodyHTML());
}

This is an extract of what is returned:

<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">ich habe die AB geändert und Ihnen zugeschickt.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-family:&quot;Arial&quot;,sans-serif">Im Preis ist die Preiserhöhung ab dem 16.08.2021 enthalten.

From what I've gathered so far this is happens in this part of your code:

case 0x1e:
// we put the complete data into a byte[] object...
final byte[] textBytes1e = getBytesFromDocumentEntry(de);
// ...and create a String object from it
return new String(textBytes1e, "ISO-8859-1");

Modifying the code similar to what was suggested in #34 fixes the problem (at least for us) and from what we've seen with our test mails doesn't break any of them.

case 0x1e:
	// we put the complete data into a byte[] object...
	final byte[] textBytes1e = getBytesFromDocumentEntry(de);
	// ...and create a String object from it
	
	String convertedString = new String(textBytes1e, "ISO-8859-1");
	Pattern pattern = Pattern.compile("charset=(\"|)([\\w\\-]+)\\1", Pattern.CASE_INSENSITIVE);
	Matcher m = pattern.matcher(convertedString);
	if(m.find()) {
		try {
			convertedString = new String(textBytes1e, Charset.forName(m.group(2)));
		} catch (Exception e) {
			//ignore and use default charset
		}
	}
	return convertedString;

I'm currently trying to get example mails, I have one so far but I can not publish it here, so I'd have to send it to you directly and with the condition that it can't be published anywhere, including test cases. If you want I can send you this one.

Nope no need. Fixed and released in 1.7.13. Cheers!