Analyis of needs for text layout and layout in general

Question

Analyis of needs for text layout and layout in general

Closed this issue 3 years ago · 15 comments

HTML and TeX starts from a complete description of a document to be able to render it. I think it's overkill and we can use a simpler "flow" layout, where declarative is minimal.

The flow renderer would be built upon CanvasRenderer2D, and you can get the inner CanvasRenderer2D in order to put some flow elements.

It can layout then render whole paragraphs, whose text could contain bold or italic text.
Which implies it can break text into breakable components. Hyphenation will probably not be supported.

Answer 1 · 2019-02-14T14:31:16.000Z

Some companies implement HTML engine to PDF

Answer 2 · 2021-07-17T12:44:06.000Z

I agree that a full text engine is an overkill. 😆

I think one should include hyphenation support because the user cannot add hyphenation without reimplementing everything. A flexible solution is to pass the functions to achieve word- and syllable-splitting into the layout function, e.g.:

/// Computes a text layout of given text with some markup.
TextLayout layoutText(string markup, float textWidth, Locale locale);

interface Locale
{
    /// Returns slices of `text` that represent individual words interleaved
    /// with the content that separates them.
    Locale[] breakWords(string text)
    out (words; join(words) == text);

    /// Returns slices of `word` that represent its syllables.
    Locale[] breakISyllables(string word)
    out (syllables; join(syllables) == word);

    wchar hypen();
}

An open question is how to implement the markup (subset of HTML, Markdown, Rdoc, …)? Personally I prefer a subset of HTML with HTML with the old elements and attributes:

<p>
    <font color="red" face="Comic Sans" size="28">Lorem Ipsum</font>
</p>
<p>
    <b>Lorem ipsum</b> dolor sit amet, <i font-style="oblique">consectetur.</i>
</p>

This approach could be complemented by the possibility to handle tags with a user-provided function. The user could easily add a function for e.g. underlining (I would prefer not to provide it by default because it is generally frowned upon).

An alternative approach can be similar to the canvas API:

auto markup = new Markup();

with (markup)
{
    save();
        fontSize = 28;
        color = brush("red");
        fontFace = "Comic Sans";

        text("Lorem Ipsum");
    restore();

    save();
        fontWeight = FontWeight.bold;
        text("Lorem ipsum");
    restore();

    text(" dolor sit amet, ");

    save();
        fontStyle = FontStyle.oblique;
        text("consectetur.");
    restore();
}

After writing these two alternatives I actually think it is best to implement the latter API (maybe there is some standard spec out there?) and provide as an "extra" parsers for HTML, Markdown and so on.

Answer 3 · 2021-07-17T13:13:06.000Z

I forgot to mention that I would tend to explicitly represent paragraphs (width and height, both possibly float.infinity) to make text wrapping across paragraphs possible which is needed for page breaks and multi-column layouts.

Answer 4 · 2021-07-18T10:20:35.000Z

Hello Arne,

The grand-master plan was to:

parse CommonMark with for example https://github.com/AuburnSounds/commonmark-d, this generates HTML
parse HTML with eg: dom.d from arsd
parse CSS :)
have a "minimal" browser to position things, an enormous endeavour but for architecture consideration it has tutorials about a browser engine. Note that this is almost a business, since this is similar to PrinceXML in design. This is, of course, a stupid amount of work. font-size in particular is hard to get right. It seems like the right final architecture for high-speed user manuals generation. There is a need to scope this since (in my case) I'm mostly interested by auto-generating user manuals for audio-software ; authoring them is less consistent.
this include very specific work with text, such as splitting words and phrases, eg: smileys get rendered in another font, this is very difficult but the HTML spec specifies this in great detail!

Alternative plans where with other input languages than Markdown, I have kinda given up on that.

A minor, more immediate plan would be to have limited understanding of "available remaining space" because right now working with text is annoying. ANd just have a flow algorithm with current X and Y in the page. If this doesn't fit in the page, insert page. Defining the API boundaries for this easy interface is of paramount importance. It would be built on top of the Canvas API of course.

My problem with the API right now is code like this:

float X = measureText(_("Quotation number: ")).width;
fillText(_("Quotation number: "), 0, 0); 
fontWeight(FontWeight.normal);
fillText(quote.prettyQuoteNumber, X, 0);

Answer 5 · 2021-07-18T12:19:58.000Z

I am working on a draft API the allows easily building pages like
this example. I think it is prudent to first create a low-level API to work with text before starting on high-level interfaces like Markdown, CommonMark or HTML.

If this is settled it is easy to translate Markdown etc. to a series of API calls. One could even compile the Markdown templates into code like the Diet templates.

Your example would look like this:

text(_("Quotation number: "));
save();
fontWeight(FontWeight.normal);
text(quote.prettyQuoteNumber);
restore();
// more text goes here...

// Compute text layout and pass it to the current renderer
layout().renderWith(renderer);

Answer 6 · 2021-07-18T15:50:37.000Z

I think it is prudent to first create a low-level API to work with text before starting on high-level interfaces like Markdown, CommonMark or HTML.

Yes. While you create the API, you have to tell all things text you want to get done with the lower level canvas interface.

Answer 7 · 2021-07-19T16:22:15.000Z

Please take a look at the printed-text repo and give me some feedback. We may also continue the discussion in the issues of that repo if you like.

Answer 8 · 2021-07-19T16:23:12.000Z

PS: under docs/examples I have created an elaborate example in HTML, PDF and D source code. This is intended as a kind of benchmark for what is required.

Answer 9 · 2021-07-20T09:53:02.000Z

Hello,
I need a bit more time to take a look and make comments. I'll get back to you in 24 hours.

Answer 10 · 2021-07-20T11:14:13.000Z

Sure, take your time. I am changing bits and pieces anyway.

Answer 11 · 2021-07-21T16:13:42.000Z

Hello @a-ludi ,

After a cursory reading it seems fit for purpose.
Based on past experiences I prefer not to interfere with design decisions so if you want to take ownership of this part, it's up to you.

So my input really depends on whether you want final ownership / maintenance of the part you design:
A - if yes, then it's better in another repo under your control with all design decisions by yourself. My task will be to be the underneath layer. I think it's perhaps the better plan.

B - if not, and you want it merged into this repo printed, then it can become a much more detailed merge review, with my personal opinions and preferences mixed in. It's not necessarily desirable for this project, as you have a real need for the super-layer if I understand correctly.

So really at this point the only remark I have might be that write is a std.stdio function, so that identifier looks like something from the stdlib and might collide.

Answer 12 · 2021-07-25T06:54:18.000Z

as you have a real need for the super-layer if I understand correctly.

The truth is that I am very intrigued by the topic.

So really at this point the only remark I have might be that write is a std.stdio function, so that identifier looks like something from the stdlib and might collide.

It is actually intentionally to make the interface similar to File, so it feels natural to D users. For the same reason I included the put alias which makes it an output range.

Answer 13 · 2021-10-05T12:08:19.000Z

I generate invoices with Chinese text and it has the following problems.

the one font that can do it on Windows is huge, it makes a 10mb PDF instantly. The reason our PDF are large are because exporters in OpenOffice split the fonts to embed only the necessary glyphs. So there is a size problem.
like with smileys, you do have to change the font in the same text line depending on the content. We would need an algorithm that splits text in components that can eventually be from different font.

Answer 14 · 2021-10-06T07:24:54.000Z

Yes, I agree. My though was that one would need this for formatting (italics, bold).

I even have an implementation that is aware of this but there are still some major bugs that I wanted to fix before pushing it to the public. Yet, I have not had the time to work on it. I will let you know once there is something to look at.

Answer 15 · 2022-02-19T19:24:10.000Z

printed:flow can convert Markdown to PDF in a pretty convincing way.