vfile/vfile-message

VFileMessage generates invalid position if no position is given

remcohaszing opened this issue · 26 comments

Initial checklist

Affected packages and versions

3.1.2

Link to runnable example

No response

Steps to reproduce

import { VFileMessage } from 'vfile-message'

const message = new VFileMessage('reason')

console.log(message.position)

Expected behavior

The message position is undefined.

According to the type definitions it’s optional, but currently it’s always set.

Actual behavior

The message position is a unist position, but the start and end line and column are set to null.

According to the unist types, start and end line and column can’t be null.

This causes various type errors. For example:

This test explicitly asserts the incorrect behaviour:

vfile-message/test.js

Lines 63 to 66 in e5e87f2

t.deepEqual(m1.position, {
start: {line: null, column: null},
end: {line: null, column: null}
})

Affected runtime and version

node@18.8.0

Affected package manager and version

npm@8.17.0

Affected OS and version

Pop!_OS 22.04

Build and bundle tools

No response

I think that’s just unist types not matching reality.

Particularly I think your second link, https://github.com/unifiedjs/unified-language-server/blob/main/lib/index.js#L70-L74, is always a good idea. We’re dealing with users here, and should expect some garbage. Not everyone uses TS. And even if TS is used, NaN could still be set. Or floats. Or floating point math could result in weird things?
I’d be more in favor of tools like unistPointToLspPosition to, at runtime, check whether values actually match what TypeScript says they do (same as syntax-tree/unist-util-lsp#1 (comment))

position could be null if someone subclasses VFileMessage without calling the constructor. It’s set to null here:

VFileMessage.prototype.position = null

The unist types match the specification for Position and Point accurately. A Position has a required start and end point and an optional indent. A Point has a required line and column of type number and an optional offset. vfile-message doesn’t follow this specification.

I won’t argue defensive programming is a good idea. People can indeed pass in incorrect data. But I do believe that official unified utilities shouldn’t create incorrect data.

The code is there to make things easier before ? was in JS.
It’s there to make defensive coding easier.
It wasn‘t really meant to exactly match unist, because it’s not a node. And this can also represent a single point instead of two points (position). This project can’t match unist when only one point is given?
Other than that, the downside of going with types is that it might break some folks. 😬

Friendly ping! :)

I think we’re on the same line that currently VFileMessage#position is incorrectly typed as a unist Position.

At this point, VFileMessage#position can never be null. I don’t think we should take subclassing into account. Technically a subclass could change any property to anything, even without subclassing a user could change any value to anything.

The code is there to make things easier before ? was in JS.

But this comes at the cost that checking if(message.position) {} won’t work. Also it would be convenient to be able to pass message.position into utilities that accept unist positions. If it would be typed correctly now, TypeScript will no longer allow this.

Also now that the oldest supported Node.js version supports optional chaining, perhaps this is a good time to change it. I think it would also be nice timing to combine this with a major unified release (which is necessary because of unifiedjs/unified#202)

The main problem I believe is that this project accepts a single point and exposes that information. The utilities typically support that starting point too as far as I am aware.

Is there any particular reason start and end in a position may not be the same (shallow copy)?

Interesting idea. Positions are supposed to be ranges. E.g., the end point of a position in the whole document, if it were a starting point, would be a point outside the document.

  • say a was the whole document, it could be represented as a whole as 1:1-1:2 (0-1)
  • the point 1:1 (0) points to the byte/character a
  • the point 1:2 (1) is EOF

In micromark/markdown-rs, events are never empty. This is carried through to mdast as well, and I believe every mdast/hast/nlcst/xast utility follows it too

This reminds me of the way Monaco editor handles TypeScript diagnostics. TypeScript text spans use the format { start: number, length: number }. As can be seen here, TypeScript can return a text span with a length of zero, but when rendering a squiggly line to display the error, the editor requires a minimal length of 1 character to underline.

I think this applies the exact same way in unist, or any text format. A text range can have a length of 0 characters, but the best way to display a range of length 0 is often to point to the character at that index.

But then you’re suggesting to not support points at all, that users should pass a position of (pseudocode) start: start, end: start + 1?

No, I suggest that a user can use:

vfile.message('hello', { line: 1, column: 1 })
assert.equal(vfile.position, { start: { line: 1, column: 1 }, end: { line: 1, column: 1 } })
assert.ok(vfile.position.start !== vfile.position.end)

But tools such as unist-util-stringify-position should be able to handle this:

import { stringifyPosition } from 'unist-util-stringify-position'

assert.equal(stringifyPosition({ start: { line: 1, column: 1 }, end: { line: 1, column: 1 } }), '1:1')
assert.equal(stringifyPosition({ start: { line: 1, column: 1 }, end: { line: 1, column: 2 } }), '1:1-1:2')

Although it might look less nice, I don’t really see any practical problems with the following representation:

assert.equal(stringifyPosition({ start: { line: 1, column: 1 }, end: { line: 1, column: 1 } }), '1:1-1:1')

I still feel a bit weird about positions having the same start and end points. It feels like that shouldn’t occur. It feels “wrong”.

but when rendering a squiggly line to display the error, the editor requires a minimal length of 1 character to underline.

That’s a bit in line with what I feel as “wrong”. Same start and end is really empty. No line rendered.

Maybe I’m on my own here though, or have to get used to it?

For example slice(start, end) in JavaScript also accepts a range. If the end equals start, then it returns an empty string / array.

Also cursors in a text editor represent a point, yet most terminal emulators display a cursor as a range of length 1.

Yeah, good “points”. I agree that that’s good on the “accepting” side of the API. But I feel weird about exposing it from an API. In this case, VfileMessage.position being like that.

I’m leaning towards making it:

position?: {start: Point; end?: Point | undefined}

What do you think?

I feel pretty strongly about using a real full unist Positions, where both start and end are required.

I still feel a bit weird about positions having the same start and end points. It feels like that shouldn’t occur. It feels “wrong”.

I understand this feeling, but I don’t think this is a strong argument to stick with optional values.

I’ve worked with LSP and Monaco editor, which deal with similar situations. They do use matching start and end points in a position to denote an empty range. This feels quirky (in a clever way) for a second, but it feels pretty natural once you get over it.

Another big downside of making start or end optional, is assignability. Given a function like this:

export function doSomethingWithPosition(position: Position): unknown

It would be really nice if we could do this:

for (const message = file.messages) {
  doSomethingWithPosition(position)
}

The only libraries I can imagine are slightly negatively affected are those that represent the position as a string (only unist-util-stringify-position that I’m aware of). However, such libraries could already get a position with a matching start and end anyway.


@ChristianMurphy I would love to hear your opinion on this as well. :)

even if TS is used, NaN could still be set. Or floats. Or floating point math could result in weird things?

We could tighten that down with better typings.
type-fest for example has a NonNegativeInteger type https://github.com/sindresorhus/type-fest#numeric

we’re dealing with users here, and should expect some garbage. Not everyone uses TS

we could validate the data as it is passed in, throwing exceptions invalid data on the JS side.

position?: {start: Point; end?: Point | undefined}

I would read a defined start and an undefined end as an unbounded length, start to infinity, rather than zero length.

They do use matching start and end points in a position to denote an empty range.

That makes sense, and lines up with what I've seen in other text editors as well.

slightly negatively affected are those that represent the position as a string

An even then, those libraries would just need to be aware that start and end could be a point, correct?

We could tighten that down with better typings.

throwing exceptions invalid data on the JS side.

I think there is a large runtime cost associated with all of this. But also a cost of how to type APIs that accept numbers currently. It’s a massive change in 500 repos. I don’t understand the benefits yet.

I feel pretty strongly about using a real full unist Positions, where both start and end are required.

Then it isn’t really about this project, but lots of projects, and it should be described in syntax-tree/unist. And tests in lots of projects.

Another big downside of making start or end optional, is assignability.

It is indeed not assignable, this issue can be solved without it being assignable. We can use types than make end optional, and then you can use an if statement:

if (position.end) {
  // Now it’s a full position, a range.
} else {
  // There’s just one point.
}

There are several ideas floating around in this discussion.
Trying to make it actionable:

  • the types here do not match the implementation
    we can fix that, we can make the types match the implementation
  • the implementation should change
    sure, we can do that, it’s a breaking change, some ways it can change:
    • we can expose Position | Point | undefined
      fine
    • we can expose Position | undefined and specify that Position can be an empty range by having the same point as start and end
      I’m not sure, needs specification, needs docs, needs tests, needs to be clear what it means i.e. how it is different from a position spanning one character.
    • feel free to propose other changes
  • feel free to propose other ideas
  • The unist docs need to be updated
    According to the unist docs Position has required properties start and end, both of type Point. A Point has line and column, both required. Regardless of the solution here, it’s good to document explicitly how empty ranges are handled.
  • implementation changes
    Inevitably there need to be implementation changes, and types need to match the implementation.

Options for the implementation changes:

  1. Make the unist.Position implementation match what vfile-message creates
    • Breaking change: This means Position['start'], Position['end'], Point['line'], and Point['column'] would be optional / nullable.
  2. Make unist.Position['end'] optional
    • Breaking change: This affects all packages that depend on unist
  3. Allow unist.Position['end'] to match unist.Position['start'], make VFileMessage#position adhere
    • The specification doesn’t explicitly state this is allowed, so in that sense it’s backwards compatible.
    • Presentational libraries may yield slightly different output
      • unist-util-stringify-position output already produces better results when given valid positions with am empty range IMO. Current behaviour:
        > stringifyPosition({start: {line: 2, column: 3}, end: {line: 2, column: 3}})
        '2:3-2:3'
        > stringifyPosition({start: {line: 2, column: 3}, end: null})
        '2:3-1:1'
      • Maybe it affects a library that depends on the boolean value of a range length or string selection for that range. This is an edge case that shouldn’t occur in the first place.
    • vfile-message needs to be updated to make VFileMessage#position match unist.Position.
  4. Keep VFileMessage#position in a format that’s not compatible with unist.Position
    • Breaking change (type level only): VFileMessage#position is no longer assignable to utilities that expect unist.Position.

I really don’t see any downsides to option 3. Both Christian and I have seen this works in practice in other places that use positional information.

According to the unist docs Position has required properties start and end, both of type Point. A Point has line and column, both required.
Regardless of the solution here, it’s good to document explicitly how empty ranges are handled.

The specification doesn’t explicitly state this is allowed, so in that sense it’s backwards compatible.

Look at the docs on Position:

The start field of Position represents the place of the first character of the parsed source region. The end field of Position represents the place of the first character after the parsed source region, whether it exists or not.
https://github.com/syntax-tree/unist#position

It explicitly states that “empty” is impossible.
If you want this, it’s breaking. I don’t think it’s trivial at all.

You’re right about it being documented. I needed to read more thoroughly.

I still don’t think this would cause any practical issues though. Personally I would prefer to update the spec to explicitly allow an empty Position.

Considering upcoming breaking changes in the ecosystem: I think it would be an improvement if a nullish end means it equals start, and point line and column are always defined. If start is also unknown, the entire position should be null. So that means a position could be one of the following:

{
  start: { line: 1, column: 2 },
  end: { line: 1, column: 2 }
}
{
  start: { line: 1, column: 2 },
  end: null
}
{
  start: { line: 1, column: 2 },
}
null
wooorm commented

Personally I would prefer to update the spec to explicitly allow an empty Position.

It still feels very weird to me to change unist in a way to support something that never actually occurs in unist. unist is supposed to represent ASTs in different programming languages. Why should it explain things that can never happen in an AST? Why should someone implementing unist in Swift or whatever have to support something that won’t ever happen?

Just because this one utility in JavaScript accepts other structures than Positions?


Your code in this last comment seems to follow solution 2 in your earlier comment (end optional), while you previously seemed to prefer your solution 3 (end can be equal to start)?


I personally prefer exposing Position | Point | undefined here, which I mentioned earlier, perhaps at a place field, which is the same as the input: https://github.com/vfile/vfile-message#vfilemessagereason-place-origin.
We could even allow Node or Node[] too.
Perhaps this can be done together with stack? #16

wooorm commented

There’s one place where start and end are the same: a root node representing an empty document (https://github.com/syntax-tree/mdast-util-from-markdown/blob/e30d9c8025d726360123ae95beb9cf011335b921/dev/lib/index.js#L380).

I’d think it’s better to explain this in the unist document though as an exception to the rule, rather than that all nodes can have same start/end points.

I actually think this is a good case that shows empty positions do exist, although they are rare.

Typically a position is part of an AST node from a parsed text document. In text documents, syntax is mostly defined by a non-zero length range of characters. Although empty ranges could have meaning.

Hypothetically a JavaScript string 'example' could be represented as:

{
  type: 'string',
  position: { start: { line: 1, column: 1 }, end: { line: 1, column: 2 } },
  children: [
    { type: 'stringQuote', value: "'", position: { start: { line: 1, column: 2 }, end: { line: 1, column: 3 } } },
    { type: 'stringContent', value: 'example', position: { start: { line: 1, column: 3 }, end: { line: 1, column: 10 } } },
    { type: 'stringQuote', valye: "'", position: { start: { line: 1, column: 10 }, end: { line: 1, column: 11 } } }
  ]
}

With such an example, an empty string ('') would be represented as:

{
  type: 'string',
  position: { start: { line: 1, column: 1 }, end: { line: 1, column: 2 } },
  children: [
    { type: 'stringQuote', value: "'", position: { start: { line: 1, column: 2 }, end: { line: 1, column: 3 } } },
    { type: 'stringContent', value: 'example', position: { start: { line: 1, column: 3 }, end: { line: 1, column: 3 } } },
    { type: 'stringQuote', valye: "'", position: { start: { line: 1, column: 3 }, end: { line: 1, column: 4 } } }
  ]
}

I also think there’s a difference between unist nodes and vfile message positions. Positions on unist nodes typically refer to parsed content, which is typically non-empty. Vfile message positions are constructed from a unist node / position / point.


Also I looked into the terminology of point / position a bit, as I find it confusing that Position in unist means something other than in LSP. I think the term Position in unist comes from a position vector. In that context a position may have length zero, it’s called a zero vector.

wooorm commented

I think the term Position in unist comes from a position vector. In that context a position may have length zero, it’s called a zero vector.

The reason the field is called position is because of https://github.com/reworkcss/css/blob/434aa1733f275a67ea700311451b98a14f8cc21a/lib/parse/index.js#L33. That project has no term for the things at start and end. Calling the whole thing Position and the start and end things points was added later.

Hypothetically a JavaScript string 'example' could be represented as:

This could be done like that, but it’s currently explicitly not done anywhere, and it’s explicitly documented that it doesn’t happen.
Like... Should there be an empty text node inside an paragraph node inside a root node, if the document is empty? Or an empty paragraph in a root?


OK, some examples to think about:

Empty document

Say there was a lint rule like https://github.com/sindresorhus/eslint-plugin-unicorn/blob/main/rules/no-empty-file.js. In a project unified-file-lint or so.
The root node has {start: {line: 1, column: 1, offset: 0}, end: {line: 1, column: 1, offset: 0}}

How would that work? How could VS code or whatever highlight (“mark red or so”) that problem?
A tool that would auto-fix this would report an actual: '' and an expected: '<!-- Empty -->' or so, right?

No tabs

https://github.com/remarkjs/remark-lint/blob/main/packages/remark-lint-no-tabs/index.js. For a\tb, it emits a warning at {line: 1, column: 2, offset: 1}. An editor would mark the tab character as red. actual: '\t', expected: ' '?

Final eol

https://github.com/remarkjs/remark-lint/blob/main/packages/remark-lint-final-newline/index.js. It currently doesn’t pass a point but it probably should? For a, I’d expect a point of {line: 1, column: 2, offset: 1}, actual: '', expected: '\n'?

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.