michaelrsweet/htmldoc

Encoding breaks for special characters

Closed this issue · 5 comments

Good day,

When converting html to pdf some special characters are displaying incorrectly:

  • ↓ displayed as a square
  • é displayed normaly, but in combination with other characters breaks (Example: Géjanne)

Version: 1.9.16
Previous version: 1.9.11

Command arguments that are being used:
--charset iso-8859-1 --format pdf14 --firstpage c1 --size A4 --bodyfont sans --textfont sans --headingfont sans --no-title --headfootfont serif --headfootsize 6 --linkcolor blue --linkstyle plain --header ... --footer ... --no-toc --toclevels 3 --toctitle Inhoudsopgave

When using utf-8 as a charset, problem still persists and many other characters are not being displayed as well (Example: ï, ë, ē, ę, etc.).

Is there any way of making it work?

Can you attach a sample HTML file that demonstrates the problem?

Hi Michael, thank you for quick reaction. Here is the arrow sample.

<!-- NEW PAGE -->
<h2> <a name="label-BRONNEN"> </a>BRONNEN</h2>
<p>&nbsp;</p>
<p>In dit e-book zijn de onderstaande bronnen gebruikt.&lt; &lt;&nbsp; &#8595;</p>

Update: é displayed normaly, but in combination with other characters breaks (Example: Géjanne)
Above issue is not related to arrow issue. Sorry for misleading.

Arrow issue is still relevant.

The Unicode arrow character (↓) isn't available in most fonts (thus the box), and HTMLDOC doesn't do fallback/multi-master fonts.

Still need the HTML for the other character breaking (just rename to .txt to attach here).

The Unicode arrow character (↓) isn't available in most fonts (thus the box), and HTMLDOC doesn't do fallback/multi-master fonts.

Still need the HTML for the other character breaking (just rename to .txt to attach here).

Issue with the other character was not htmldoc problem. Thank you for your help.