partiql/partiql-lang-kotlin

Inefficient pretty printing causes stack overflow

RCHowell opened this issue · 0 comments

Description

The pretty-printer is loosely based upon Wadler Prettier-Printer. In an attempt a faithful modeling of the data structures, then end result is an inefficient printer. We can greatly reduce the stack size and gain some speed with a reduced (but equivalent) model.

The primary bug we are facing now is that query expansions cause the pretty printer to stack overflow.

-- v1 as int
COALESCE(COALESCE(COALESCE(V1)))

-- expands to
CASE WHEN NOT (CASE WHEN NOT (CASE WHEN NOT (V1 IS NULL) THEN V1 ELSE CAST(NULL AS INT) END IS NULL) THEN CASE WHEN NOT (V1 IS NULL) THEN V1 ELSE CAST(NULL AS INT) END ELSE CAST(NULL AS INT) END IS NULL) THEN CASE WHEN NOT (CASE WHEN NOT (V1 IS NULL) THEN V1 ELSE CAST(NULL AS INT) END IS NULL) THEN CASE WHEN NOT (V1 IS NULL) THEN V1 ELSE CAST(NULL AS INT) END ELSE CAST(NULL AS INT) END ELSE CAST(NULL AS INT) END

The expansion follows from an SQL equivalence, but the duplication of V1 makes this balloon.

To Reproduce

Steps to reproduce the behavior:

Parse and pretty-print the expanded query.

Expected Behavior

  • < A clear and concise description of what you expected to happen. >

Additional Context

Alan has provided a detail write-up here as well.

partiql/partiql-scribe#26

Proposal

We can simplify the pretty printer and pick up some perf improvements via a simpler block tree like

sealed class Block {
    
    public var next: Block? = null
    
    object NL : Block
    
    class Text(val context: String): Block
    
    class Nest(
        val prefix: String?,
        val postfix: String?,
        val child: Block,
    ) : Block
}

For the curious, the current block tree is a giant linked list (some sublists) of tokens which is this way because that's how the prettier-printer paper algorithm is able to reflow a tree. That being said, the "pretty" part of the "pretty" printer is subjective and less important than having stack overflows are relatively small inputs.

For example, a current nest is like

link(link('{', link(child(....),  '}) 

And we see the link nodes approach the absurd, this is my bad 😄 so let's simplify this.