smithy-lang/smithy

IDL Model serializer ignores trailing whitespace in documentation

kubukoz opened this issue · 2 comments

If you parse this model:

namespace test

@documentation("foo 
bar")
string Demo

very important: there's a space after foo, before the newline.

then the value of the documentation trait is: "foo \nbar", which I think is correct.

If you serialize that with the SmithyIdlModelSerializer, it gets turned into:

$version: "2.0"
namespace test

/// foo
/// bar
string Demo

and the trailing space is gone. As a result, if you parse that again, you no longer have the same documentation trait value (it's foo\nbar instead), which causes the following problems for my team:

  1. model merging considers these shapes different (because of the trait change), which fails to assemble.

  2. Name uniqueness checks start to consider the shape different from its original counterpart (for a specific reason, we have an identical shape in another namespace, but it's used verbatim instead of being parsed + serialized, so it still has the trailing space).


I think this is an inconsistency in the serializer - as much as I love to remove trailing whitespace, it clearly can cause subtle bugs.

Full example, a scala-cli script:

//> using dep "software.amazon.smithy:smithy-model:1.43.0"
//> using option "-Wunused:all"
import software.amazon.smithy.model.shapes.SmithyIdlModelSerializer

import software.amazon.smithy.model.traits.DocumentationTrait
import software.amazon.smithy.model.shapes.ShapeId
import software.amazon.smithy.model.Model
import scala.jdk.CollectionConverters._

val input =
  s"""
     |namespace test
     |
     |@documentation("foo${" " /* this ensures we have trailing whitespace no matter what editor config you use */}
     |bar")
     |string Demo
     |""".stripMargin
val m1 = Model
  .assembler()
  .addUnparsedModel(
    "test.smithy",
    input
  )
  .assemble()
  .unwrap()

val docValue = m1
  .expectShape(ShapeId.from("test#Demo"))
  .getTrait(classOf[DocumentationTrait])
  .get
  .getValue

println(docValue == "foo \nbar") //true

val serialized = SmithyIdlModelSerializer
  .builder()
  .build()
  .serialize(m1)
  .asScala
  .head
  ._2

val m2 = Model
  .assembler()
  .addUnparsedModel(
    "test.smithy",
    serialized
  )
  .assemble()
  .unwrap()

val docValue2 = m2
  .expectShape(ShapeId.from("test#Demo"))
  .getTrait(classOf[DocumentationTrait])
  .get
  .getValue

println(docValue2 == "foo\nbar") //true

Edit: this is also inconsistent with the ModelSerializer which encodes as a Node - that one keeps the whitespace.

I agree that the smithy-model serializers should keep the trailing whitespace like you said. However, I would expect the formatter in smithy-syntax to remove it by default though since I think that's what the vast majority of people would expect.

I agree that the smithy-model serializers should keep the trailing whitespace like you said. However, I would expect the formatter in smithy-syntax to remove it by default though since I think that's what the vast majority of people would expect.

we're planning to switch to that formatter when it's in the LSP 😅 which is happening in smithy-lang/smithy-language-server#139