danfickle/openhtmltopdf

Make slash non-breaking.

Magotchi opened this issue · 4 comments

In the following example, Firefox and Chromium refuse to break within any of the "test/test" sequences.

openhtmltopdf puts a break between "test" and "/test" (having the / on the next line), as shown here.

<!DOCTYPE html>
<html>
    <head>
        <title></title>
    </head>
    <body>
        <p style="font-size: 5em;">
            test/test test/test test/test
        </p>
    </body>
</html>

I believe it would be best if slash ("/") were considered to be a non-breaking character as other browsers seem to do.

Thanks for the report. We do have a custom breaker wrapper so this should be possible. I'll look into it when I restart work on this project (hopefully soon).

@Magotchi - I looked into this and it turns out the entire point of our custom breaker is to break on slashes! The use case for this is long URLs, rather than breaking at arbitrary points in the URL.

Could you outline a little more the reason this is a problem? Thanks.

In my opinion, it's a problem because:

  • It is difficult to understand that the idea is to be continued when the punctuation is at the beginning of the next line rather than at the end of the first. If you're going to make the decision to have it break, I feel like it's better to keep the punctuation at the end of the first line.
  • The behavior is inconsistent with the screen and print behavior of other popular browsers. Whether that's important to you, I don't know.

I looked into the behavior of very large words (sequences of letters) that contain slashes:

  • Chromium refuses to ever break on slash. When printing, it seems to reduce the font size of the element containing the unbreakable word until it fits on the page. When viewing the page on a screen, it just lets it overflow the view.
  • Firefox will break on a slash if the word would overflow the view width, in both print and screen modes. Sadly for my argument, it puts the slash on the next line. However, I would guess that the thought-process of the Firefox developers is that the situation where a slashed word would overflow the URL width is almost certainly a URL, and URLs actually do seem more readable if each subsequent line begins with the slash.
  • Both browsers still refuse to break on slash if the word would not overflow the view width.

@Magotchi - Thanks for the detailed write-up. I've added a method to the builder to let the user specify a line breaker. For no-break on forward slashes you can use a default Java BreakIterator:

             builder.useUnicodeLineBreaker(new FSTextBreaker() {
                BreakIterator br = BreakIterator.getLineInstance(Locale.US);

                @Override
                public void setText(String newText) {
                    br.setText(newText);
                }

                @Override
                public int next() {
                    return br.next();
                }
            });

If you are using the rtl module you can also use a BreakIterator supplied by ICU4J (use whatever locale you like):

builder.useUnicodeLineBreaker(new ICUBreakers.ICULineBreaker(Locale.CANADA));

However, the ICU line breaker breaks after forward slashes! Since ICU4J implements a more modern version of the Unicode standard, I suspect that some time in the future, the stock standard Java break iterator will do the same.